Auto-Routing - switchAILocal

Overview

Auto-routing enables switchAILocal to automatically select the best available provider for your request. Omit the provider prefix from your model name to activate intelligent routing.

How It Works

Basic Auto-Routing

Simply use the model name without a provider prefix:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123"
)

# Auto-routing: no provider prefix
response = client.chat.completions.create(
    model="gemini-2.5-pro",  # Not "geminicli:gemini-2.5-pro"
    messages=[{"role": "user", "content": "Hello!"}]
)

Routing Algorithm

switchAILocal evaluates providers in this order:

Provider Availability: Check if provider supports the model
Provider Health: Skip unhealthy or quota-exceeded providers
Priority Order: Follow configured priority preferences
Cost Optimization: Prefer CLI and local providers (free)
Success Rate: Favor providers with better historical performance
Fallback: Try alternative providers if primary fails

Routing Priority

Default Priority

By default, switchAILocal prioritizes in this order:

CLI Providers (geminicli:, claudecli:, etc.) - Uses your paid subscriptions
Local Providers (ollama:, lmstudio:) - Free and private
switchAI (switchai:) - Unified gateway with auto-selection
API Providers (gemini:, claude:, etc.) - Direct API access

Custom Priority

Override the default priority in config.yaml:

config.yaml

routing:
  priority:
    - ollama      # Try local models first
    - geminicli   # Then CLI providers
    - switchai    # Then switchAI
    - gemini      # Finally APIs

Intelligent Features

Health-Based Routing

switchAILocal monitors provider health and automatically routes away from failing providers:

config.yaml

heartbeat:
  enabled: true
  interval: 60  # Check every 60 seconds
  providers:
    - geminicli
    - ollama
    - switchai

Unhealthy providers are automatically skipped during routing.

Quota-Aware Routing

When a provider exceeds quota, switchAILocal automatically fails over:

# First request succeeds with geminicli
response1 = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: geminicli

# If geminicli hits quota, automatically switches
response2 = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: gemini (API fallback)

Success Rate Optimization

With Memory system enabled, switchAILocal learns which providers perform best:

config.yaml

memory:
  enabled: true
  provider_selection:
    enabled: true
    min_samples: 10  # Learn after 10 requests

Providers with higher success rates are preferred in future requests.

Model Mapping

Automatically map unavailable models to alternatives:

config.yaml

ampcode:
  model_mappings:
    - from: "gpt-5"
      to: "gemini-2.5-pro"
      regex: false
    - from: "claude-opus-4"
      to: "claude-sonnet-4"
      regex: false

Requests for unavailable models are automatically redirected.

Configuration

Enable Auto-Routing

config.yaml

routing:
  auto_routing: true  # Default: true
  fallback_enabled: true  # Try alternatives on failure
  max_retries: 3  # Retry attempts per provider

Provider Weights

Assign weights to providers for load distribution:

config.yaml

routing:
  weights:
    geminicli: 0.5    # 50% of requests
    ollama: 0.3       # 30% of requests
    switchai: 0.2     # 20% of requests

Exclude Providers

Exclude specific providers from auto-routing:

config.yaml

routing:
  exclude:
    - expensive-provider
    - slow-provider

Examples

Cost-Optimized Routing

config.yaml

routing:
  priority:
    - ollama      # Free local models
    - geminicli   # Free CLI (with subscription)
    - switchai    # Paid unified gateway

# Automatically uses cheapest available provider
response = client.chat.completions.create(
    model="llama3.2",  # Available in Ollama
    messages=[{"role": "user", "content": "Hello!"}]
)
# Used: ollama (free)

Performance-Optimized Routing

config.yaml

routing:
  priority:
    - switchai    # Fast cloud API
    - gemini      # Fast Google API
    - geminicli   # Slower CLI
    - ollama      # Depends on hardware

Privacy-Optimized Routing

config.yaml

routing:
  priority:
    - ollama      # Fully local
    - geminicli   # Local CLI execution
  exclude:
    - switchai    # Cloud service
    - gemini      # Cloud service
    - claude      # Cloud service

Hybrid Strategy

Combine local and cloud for best of both:

def route_by_task(task_type):
    if task_type == "simple":
        # Use local for simple tasks
        return "llama3.2"  # Routed to Ollama
    elif task_type == "complex":
        # Use cloud for complex tasks
        return "gemini-2.5-pro"  # Routed to best Gemini provider
    elif task_type == "coding":
        # Use CLI for coding (supports attachments)
        return "geminicli:gemini-2.5-pro"

response = client.chat.completions.create(
    model=route_by_task("complex"),
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

Routing Transparency

Response Headers

Check which provider was used via response headers:

from openai import OpenAI
import httpx

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
    http_client=httpx.Client()
)

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Check routing decision (implementation-dependent)
print(f"Provider used: {response.model}")  # May include prefix

Logs

View routing decisions in logs:

tail -f logs/main.log | grep routing

[INFO] Auto-routing: selected 'geminicli' for model 'gemini-2.5-pro'
[INFO] Provider 'geminicli' quota exceeded, trying fallback
[INFO] Auto-routing: selected 'gemini' for model 'gemini-2.5-pro'

Management API

Query routing decisions:

curl http://localhost:18080/v0/management/analytics \
  -H "X-Management-Key: your-secret-key"

{
  "routing": {
    "total_requests": 1000,
    "provider_usage": {
      "geminicli": 650,
      "gemini": 200,
      "switchai": 100,
      "ollama": 50
    },
    "fallback_rate": 0.15
  }
}

Advanced Patterns

Conditional Routing

Route based on request attributes:

def smart_route(messages, needs_tools=False, needs_vision=False):
    if needs_vision:
        return "geminicli:gemini-2.5-pro"  # Best vision
    elif needs_tools:
        return "switchai:auto"  # Best tools
    elif len(str(messages)) > 10000:
        return "gemini-2.5-pro"  # Large context (auto-route)
    else:
        return "ollama:llama3.2"  # Fast local

response = client.chat.completions.create(
    model=smart_route(messages, needs_vision=True),
    messages=messages
)

Time-Based Routing

Route differently based on time of day:

from datetime import datetime

def time_based_route(model):
    hour = datetime.now().hour
    
    if 9 <= hour <= 17:  # Business hours
        # Use paid APIs for better performance
        return f"switchai:{model}"
    else:  # Off-hours
        # Use free providers
        return model  # Auto-route to free providers

response = client.chat.completions.create(
    model=time_based_route("gemini-2.5-pro"),
    messages=[{"role": "user", "content": "Hello!"}]
)

Budget-Based Routing

class BudgetRouter:
    def __init__(self, daily_budget):
        self.daily_budget = daily_budget
        self.spent_today = 0
    
    def route(self, model):
        if self.spent_today >= self.daily_budget:
            # Budget exceeded, use free providers only
            return model  # Auto-route to geminicli, ollama
        else:
            # Budget available, allow paid APIs
            return f"switchai:{model}"
    
    def record_cost(self, cost):
        self.spent_today += cost

router = BudgetRouter(daily_budget=10.0)  # $10/day

response = client.chat.completions.create(
    model=router.route("gemini-2.5-pro"),
    messages=[{"role": "user", "content": "Hello!"}]
)

router.record_cost(0.01)  # Track usage

Monitoring

Usage Statistics

Track provider usage:

curl http://localhost:18080/v0/management/usage \
  -H "X-Management-Key: your-secret-key"

{
  "total_requests": 5000,
  "by_provider": {
    "geminicli": {
      "requests": 3000,
      "tokens": 1500000,
      "cost": 0
    },
    "gemini": {
      "requests": 1500,
      "tokens": 750000,
      "cost": 15.50
    },
    "ollama": {
      "requests": 500,
      "tokens": 250000,
      "cost": 0
    }
  }
}

Provider Health

Monitor provider availability:

curl http://localhost:18080/v0/management/heartbeat/status \
  -H "X-Management-Key: your-secret-key"

{
  "providers": [
    {
      "id": "geminicli",
      "status": "healthy",
      "last_check": "2026-03-09T10:30:00Z",
      "response_time_ms": 250,
      "success_rate": 0.98
    },
    {
      "id": "gemini",
      "status": "quota_exceeded",
      "last_check": "2026-03-09T10:30:00Z",
      "response_time_ms": 500,
      "success_rate": 0.75
    }
  ]
}

Troubleshooting

No Providers Available

Error: No providers available for model 'gemini-2.5-pro' Solutions:

Verify providers are configured and authenticated
Check provider status: GET /v1/providers
Try explicit routing: geminicli:gemini-2.5-pro
Check logs for provider initialization errors

All Providers Failing

Error: All providers failed for model 'gemini-2.5-pro' Solutions:

Check provider health: GET /v0/management/heartbeat/status
Verify API keys are valid
Check quota limits
Try different model: GET /v1/models

Unexpected Provider Used

Issue: Wrong provider selected during auto-routing Solutions:

Check routing priority: Review config.yaml
Verify provider health: Unhealthy providers are skipped
Use explicit routing: Add provider prefix
Check logs: Review routing decisions

Best Practices

Use Auto-Routing by Default

Start with auto-routing and only use explicit prefixes when needed:

# Good: Auto-routing
model = "gemini-2.5-pro"

# Use explicit only when necessary
if needs_cli_features:
    model = "geminicli:gemini-2.5-pro"

Configure Priorities

Set priorities based on your preferences:

routing:
  priority:
    - geminicli  # Free with subscription
    - ollama     # Free local
    - switchai   # Paid unified

Enable Health Monitoring

Use Heartbeat for automatic failover:

heartbeat:
  enabled: true
  interval: 60

Monitor Usage

Track provider usage to optimize costs:

curl http://localhost:18080/v0/management/usage

Next Steps

Provider Prefixes

Learn about explicit provider routing

Heartbeat

Configure provider health monitoring

Memory

Enable success rate optimization

Configuration

Configure routing preferences

Overview

Endpoints

Provider Formats

Management API

​Overview

​How It Works

​Basic Auto-Routing

​Routing Algorithm

​Routing Priority

​Default Priority

​Custom Priority

​Intelligent Features

​Health-Based Routing

​Quota-Aware Routing

​Success Rate Optimization

​Model Mapping

​Configuration

​Enable Auto-Routing

​Provider Weights

​Exclude Providers

​Examples

​Cost-Optimized Routing

​Performance-Optimized Routing

​Privacy-Optimized Routing

​Hybrid Strategy

​Routing Transparency

​Response Headers

​Logs

​Management API

​Advanced Patterns

​Conditional Routing

​Time-Based Routing

​Budget-Based Routing

​Monitoring

​Usage Statistics

​Provider Health

​Troubleshooting

​No Providers Available

​All Providers Failing

​Unexpected Provider Used

​Best Practices

​Next Steps

Provider Prefixes

Heartbeat

Memory

Configuration

Overview

How It Works

Basic Auto-Routing

Routing Algorithm

Routing Priority

Default Priority

Custom Priority

Intelligent Features

Health-Based Routing

Quota-Aware Routing

Success Rate Optimization

Model Mapping

Configuration

Enable Auto-Routing

Provider Weights

Exclude Providers

Examples

Cost-Optimized Routing

Performance-Optimized Routing

Privacy-Optimized Routing

Hybrid Strategy

Routing Transparency

Response Headers

Logs

Management API

Advanced Patterns

Conditional Routing

Time-Based Routing

Budget-Based Routing

Monitoring

Usage Statistics

Provider Health

Troubleshooting

No Providers Available

All Providers Failing

Unexpected Provider Used

Best Practices

Next Steps