Skip to main content

Overview

Auto-routing enables switchAILocal to automatically select the best available provider for your request. Omit the provider prefix from your model name to activate intelligent routing.

How It Works

Basic Auto-Routing

Simply use the model name without a provider prefix:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123"
)

# Auto-routing: no provider prefix
response = client.chat.completions.create(
    model="gemini-2.5-pro",  # Not "geminicli:gemini-2.5-pro"
    messages=[{"role": "user", "content": "Hello!"}]
)

Routing Algorithm

switchAILocal evaluates providers in this order:
  1. Provider Availability: Check if provider supports the model
  2. Provider Health: Skip unhealthy or quota-exceeded providers
  3. Priority Order: Follow configured priority preferences
  4. Cost Optimization: Prefer CLI and local providers (free)
  5. Success Rate: Favor providers with better historical performance
  6. Fallback: Try alternative providers if primary fails

Routing Priority

Default Priority

By default, switchAILocal prioritizes in this order:
  1. CLI Providers (geminicli:, claudecli:, etc.) - Uses your paid subscriptions
  2. Local Providers (ollama:, lmstudio:) - Free and private
  3. switchAI (switchai:) - Unified gateway with auto-selection
  4. API Providers (gemini:, claude:, etc.) - Direct API access

Custom Priority

Override the default priority in config.yaml:
config.yaml
routing:
  priority:
    - ollama      # Try local models first
    - geminicli   # Then CLI providers
    - switchai    # Then switchAI
    - gemini      # Finally APIs

Intelligent Features

Health-Based Routing

switchAILocal monitors provider health and automatically routes away from failing providers:
config.yaml
heartbeat:
  enabled: true
  interval: 60  # Check every 60 seconds
  providers:
    - geminicli
    - ollama
    - switchai
Unhealthy providers are automatically skipped during routing.

Quota-Aware Routing

When a provider exceeds quota, switchAILocal automatically fails over:
# First request succeeds with geminicli
response1 = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: geminicli

# If geminicli hits quota, automatically switches
response2 = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: gemini (API fallback)

Success Rate Optimization

With Memory system enabled, switchAILocal learns which providers perform best:
config.yaml
memory:
  enabled: true
  provider_selection:
    enabled: true
    min_samples: 10  # Learn after 10 requests
Providers with higher success rates are preferred in future requests.

Model Mapping

Automatically map unavailable models to alternatives:
config.yaml
ampcode:
  model_mappings:
    - from: "gpt-5"
      to: "gemini-2.5-pro"
      regex: false
    - from: "claude-opus-4"
      to: "claude-sonnet-4"
      regex: false
Requests for unavailable models are automatically redirected.

Configuration

Enable Auto-Routing

config.yaml
routing:
  auto_routing: true  # Default: true
  fallback_enabled: true  # Try alternatives on failure
  max_retries: 3  # Retry attempts per provider

Provider Weights

Assign weights to providers for load distribution:
config.yaml
routing:
  weights:
    geminicli: 0.5    # 50% of requests
    ollama: 0.3       # 30% of requests
    switchai: 0.2     # 20% of requests

Exclude Providers

Exclude specific providers from auto-routing:
config.yaml
routing:
  exclude:
    - expensive-provider
    - slow-provider

Examples

Cost-Optimized Routing

config.yaml
routing:
  priority:
    - ollama      # Free local models
    - geminicli   # Free CLI (with subscription)
    - switchai    # Paid unified gateway
# Automatically uses cheapest available provider
response = client.chat.completions.create(
    model="llama3.2",  # Available in Ollama
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: ollama (free)

Performance-Optimized Routing

config.yaml
routing:
  priority:
    - switchai    # Fast cloud API
    - gemini      # Fast Google API
    - geminicli   # Slower CLI
    - ollama      # Depends on hardware

Privacy-Optimized Routing

config.yaml
routing:
  priority:
    - ollama      # Fully local
    - geminicli   # Local CLI execution
  exclude:
    - switchai    # Cloud service
    - gemini      # Cloud service
    - claude      # Cloud service

Hybrid Strategy

Combine local and cloud for best of both:
def route_by_task(task_type):
    if task_type == "simple":
        # Use local for simple tasks
        return "llama3.2"  # Routed to Ollama
    elif task_type == "complex":
        # Use cloud for complex tasks
        return "gemini-2.5-pro"  # Routed to best Gemini provider
    elif task_type == "coding":
        # Use CLI for coding (supports attachments)
        return "geminicli:gemini-2.5-pro"

response = client.chat.completions.create(
    model=route_by_task("complex"),
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

Routing Transparency

Response Headers

Check which provider was used via response headers:
from openai import OpenAI
import httpx

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
    http_client=httpx.Client()
)

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Check routing decision (implementation-dependent)
print(f"Provider used: {response.model}")  # May include prefix

Logs

View routing decisions in logs:
tail -f logs/main.log | grep routing
[INFO] Auto-routing: selected 'geminicli' for model 'gemini-2.5-pro'
[INFO] Provider 'geminicli' quota exceeded, trying fallback
[INFO] Auto-routing: selected 'gemini' for model 'gemini-2.5-pro'

Management API

Query routing decisions:
curl http://localhost:18080/v0/management/analytics \
  -H "X-Management-Key: your-secret-key"
{
  "routing": {
    "total_requests": 1000,
    "provider_usage": {
      "geminicli": 650,
      "gemini": 200,
      "switchai": 100,
      "ollama": 50
    },
    "fallback_rate": 0.15
  }
}

Advanced Patterns

Conditional Routing

Route based on request attributes:
def smart_route(messages, needs_tools=False, needs_vision=False):
    if needs_vision:
        return "geminicli:gemini-2.5-pro"  # Best vision
    elif needs_tools:
        return "switchai:auto"  # Best tools
    elif len(str(messages)) > 10000:
        return "gemini-2.5-pro"  # Large context (auto-route)
    else:
        return "ollama:llama3.2"  # Fast local

response = client.chat.completions.create(
    model=smart_route(messages, needs_vision=True),
    messages=messages
)

Time-Based Routing

Route differently based on time of day:
from datetime import datetime

def time_based_route(model):
    hour = datetime.now().hour
    
    if 9 <= hour <= 17:  # Business hours
        # Use paid APIs for better performance
        return f"switchai:{model}"
    else:  # Off-hours
        # Use free providers
        return model  # Auto-route to free providers

response = client.chat.completions.create(
    model=time_based_route("gemini-2.5-pro"),
    messages=[{"role": "user", "content": "Hello!"}]
)

Budget-Based Routing

class BudgetRouter:
    def __init__(self, daily_budget):
        self.daily_budget = daily_budget
        self.spent_today = 0
    
    def route(self, model):
        if self.spent_today >= self.daily_budget:
            # Budget exceeded, use free providers only
            return model  # Auto-route to geminicli, ollama
        else:
            # Budget available, allow paid APIs
            return f"switchai:{model}"
    
    def record_cost(self, cost):
        self.spent_today += cost

router = BudgetRouter(daily_budget=10.0)  # $10/day

response = client.chat.completions.create(
    model=router.route("gemini-2.5-pro"),
    messages=[{"role": "user", "content": "Hello!"}]
)

router.record_cost(0.01)  # Track usage

Monitoring

Usage Statistics

Track provider usage:
curl http://localhost:18080/v0/management/usage \
  -H "X-Management-Key: your-secret-key"
{
  "total_requests": 5000,
  "by_provider": {
    "geminicli": {
      "requests": 3000,
      "tokens": 1500000,
      "cost": 0
    },
    "gemini": {
      "requests": 1500,
      "tokens": 750000,
      "cost": 15.50
    },
    "ollama": {
      "requests": 500,
      "tokens": 250000,
      "cost": 0
    }
  }
}

Provider Health

Monitor provider availability:
curl http://localhost:18080/v0/management/heartbeat/status \
  -H "X-Management-Key: your-secret-key"
{
  "providers": [
    {
      "id": "geminicli",
      "status": "healthy",
      "last_check": "2026-03-09T10:30:00Z",
      "response_time_ms": 250,
      "success_rate": 0.98
    },
    {
      "id": "gemini",
      "status": "quota_exceeded",
      "last_check": "2026-03-09T10:30:00Z",
      "response_time_ms": 500,
      "success_rate": 0.75
    }
  ]
}

Troubleshooting

No Providers Available

Error: No providers available for model 'gemini-2.5-pro' Solutions:
  1. Verify providers are configured and authenticated
  2. Check provider status: GET /v1/providers
  3. Try explicit routing: geminicli:gemini-2.5-pro
  4. Check logs for provider initialization errors

All Providers Failing

Error: All providers failed for model 'gemini-2.5-pro' Solutions:
  1. Check provider health: GET /v0/management/heartbeat/status
  2. Verify API keys are valid
  3. Check quota limits
  4. Try different model: GET /v1/models

Unexpected Provider Used

Issue: Wrong provider selected during auto-routing Solutions:
  1. Check routing priority: Review config.yaml
  2. Verify provider health: Unhealthy providers are skipped
  3. Use explicit routing: Add provider prefix
  4. Check logs: Review routing decisions

Best Practices

Start with auto-routing and only use explicit prefixes when needed:
# Good: Auto-routing
model = "gemini-2.5-pro"

# Use explicit only when necessary
if needs_cli_features:
    model = "geminicli:gemini-2.5-pro"
Set priorities based on your preferences:
routing:
  priority:
    - geminicli  # Free with subscription
    - ollama     # Free local
    - switchai   # Paid unified
Use Heartbeat for automatic failover:
heartbeat:
  enabled: true
  interval: 60
Track provider usage to optimize costs:
curl http://localhost:18080/v0/management/usage

Next Steps