Overview
Routing in switchAILocal determines which credential and provider handle each incoming request. The system supports multiple routing strategies, intelligent fallback, and per-model quota management.
Routing Configuration
Configure routing behavior in config.yaml:
routing :
# Primary strategy: "round-robin" or "fill-first"
strategy : "round-robin"
# Optional: Priority list for auto model resolution
# auto-model-priority:
# - "ollama:gpt-oss:120b-cloud"
# - "switchai-chat"
# - "gemini-2.5-flash"
Routing Strategies
The routing strategy determines how multiple credentials for the same provider are selected.
Round-Robin
Distributes requests evenly across all available credentials.
The RoundRobinSelector (sdk/switchailocal/auth/selector.go) maintains per-model cursors: type RoundRobinSelector struct {
mu sync . Mutex
cursors map [ string ] int // "provider:model" -> cursor
}
func ( s * RoundRobinSelector ) Pick ( ... , auths [] * Auth ) ( * Auth , error ) {
key := provider + ":" + model
s . mu . Lock ()
index := s . cursors [ key ]
s . cursors [ key ] = index + 1
s . mu . Unlock ()
return available [ index % len ( available )], nil
}
Behavior:
First request to gpt-4 uses credential A
Second request to gpt-4 uses credential B
Third request to gpt-4 uses credential C
Fourth request to gpt-4 wraps back to credential A
Use case : Distribute load evenly, maximize quota utilization
Example :
routing :
strategy : "round-robin"
codex-api-key :
- api-key : "sk-proj-A..."
- api-key : "sk-proj-B..."
- api-key : "sk-proj-C..."
Round-robin is tracked per model. Requests to gpt-4 and gpt-3.5-turbo maintain independent cursors.
Fill-First
Uses the first available credential until it’s exhausted or in cooldown, then moves to the next.
The FillFirstSelector always picks the first available credential: type FillFirstSelector struct {}
func ( s * FillFirstSelector ) Pick ( ... , auths [] * Auth ) ( * Auth , error ) {
available , err := getAvailableAuths ( auths , provider , model , now )
if err != nil {
return nil , err
}
// Always return first (auths are sorted by ID for consistency)
return available [ 0 ], nil
}
Behavior:
All requests use credential A
When A hits quota → switch to credential B
When B hits quota → switch to credential C
When A recovers → switch back to A
Use case : Stagger subscription caps, optimize for rolling time windows
Example :
routing :
strategy : "fill-first"
claude-api-key :
- api-key : "sk-ant-primary..."
- api-key : "sk-ant-backup..."
Fill-first works well with providers that have daily/monthly quotas rather than per-minute rate limits.
Credential Selection Process
The Auth Manager follows a multi-step process to select credentials:
1. Provider Matching
func ( m * Manager ) Execute ( ctx , providers [] string , req Request , opts ) {
// Normalize and rotate provider list
normalized := m . normalizeProviders ( providers )
rotated := m . rotateProviders ( req . Model , normalized )
}
Provider names are normalized (lowercased, deduplicated).
2. Model Support Filtering
for _ , candidate := range m . auths {
if candidate . Provider != provider || candidate . Disabled {
continue
}
// Check model registry
if ! registryRef . ClientSupportsModel ( candidate . ID , modelKey ) {
continue
}
candidates = append ( candidates , candidate )
}
Credentials are filtered based on model support from the registry.
3. Status Filtering
func getAvailableAuths ( auths , provider , model , now ) ([] * Auth , error ) {
available , cooldownCount , earliest := collectAvailable ( auths , model , now )
if len ( available ) == 0 {
if cooldownCount == len ( auths ) && ! earliest . IsZero () {
resetIn := earliest . Sub ( now )
return nil , newModelCooldownError ( model , provider , resetIn )
}
return nil , & Error { Code : "auth_unavailable" }
}
return available , nil
}
Checks:
Not disabled (auth.Disabled == false)
Not unavailable (auth.Unavailable == false)
Past retry time (auth.NextRetryAfter < now)
Model-specific state (if tracked)
4. Strategy Application
The selected strategy picks one credential from the available pool:
auth , err := m . selector . Pick ( ctx , provider , model , opts , candidates )
Multi-Provider Routing
You can specify multiple providers for the same model:
routing :
strategy : "round-robin"
# Same model available from multiple providers
codex-api-key :
- api-key : "sk-proj-openai..."
models :
- name : "gpt-4o"
openai-compatibility :
- name : "openrouter"
prefix : "or"
api-key-entries :
- api-key : "sk-or-v1..."
# Also provides gpt-4o
When both providers support gpt-4o:
Manager tries OpenAI provider first
If OpenAI is in cooldown → tries OpenRouter
Rotates starting provider on next request
Per-model provider rotation ensures even distribution when multiple providers offer the same model.
Intelligent Routing (Cortex Phase 2)
When Intelligence is enabled, routing becomes content-aware:
intelligence :
enabled : true
router-model : "ollama:gpt-oss:20b-cloud"
matrix :
coding : "switchai-chat"
reasoning : "switchai-reasoner"
fast : "switchai-fast"
secure : "ollama:llama3.2"
vision : "ollama:qwen3-vl:235b-instruct-cloud"
semantic-tier :
enabled : true
confidence-threshold : 0.85
Classification Flow
The Intelligence Service uses the router model to classify requests: type Classification struct {
Intent string // "coding", "reasoning", "fast", etc.
Confidence float64 // 0.0 to 1.0
Model string // Resolved model from matrix
}
func ( s * Service ) Classify ( ctx , req ) ( * Classification , error ) {
// 1. Check semantic cache
// 2. Try semantic matching with embeddings
// 3. Fall back to LLM classification
// 4. Apply confidence thresholds
}
Intent mapping:
coding : Code generation, debugging, refactoring
reasoning : Complex problem-solving, math, logic
fast : Simple queries, casual conversation
secure : Privacy-sensitive, runs locally only
vision : Image analysis, OCR, visual tasks
Quota Management
Quota tracking prevents retry storms when providers hit rate limits.
Quota States
Each credential tracks quota status:
type QuotaState struct {
Exceeded bool // Currently over quota
Reason string // "quota", "rate_limit", etc.
NextRecoverAt time . Time // When quota resets
BackoffLevel int // Exponential backoff level
}
Backoff Schedule
Exponential backoff prevents hammering rate-limited providers:
func nextQuotaCooldown ( prevLevel int ) ( time . Duration , int ) {
cooldown := quotaBackoffBase * time . Duration ( 1 << prevLevel )
// Level 0: 1 second
// Level 1: 2 seconds
// Level 2: 4 seconds
// Level 3: 8 seconds
// ...
// Max: 30 minutes
if cooldown >= quotaBackoffMax {
return quotaBackoffMax , prevLevel
}
return cooldown , prevLevel + 1
}
Set quota-exceeded.switch-project: true to automatically switch to another credential when quota is hit.
Model-Level Quotas
Quotas are tracked per-model for fine-grained control:
# API key exhausted gpt-4 quota but gpt-3.5-turbo still works
type Auth struct {
ModelStates map[string]*ModelState
}
type ModelState struct {
Quota QuotaState // Per-model quota tracking
}
Behavior:
# Request 1: gpt-4 with key-A → Success
# Request 2: gpt-4 with key-A → 429 Too Many Requests
# Request 3: gpt-4 with key-A → Skipped (in cooldown)
# Request 4: gpt-4 with key-B → Success (different key)
# Request 5: gpt-3.5-turbo with key-A → Success (different model)
Retry Logic
Configurable retry behavior for transient failures:
request-retry : 3 # Retry up to 3 times
streaming :
bootstrap-retries : 2 # Retries before first byte
func ( m * Manager ) shouldRetryAfterError ( err error , attempt , maxAttempts int ,
providers [] string , model string ,
maxWait time . Duration ) ( time . Duration , bool ) {
// No retry on last attempt
if attempt >= maxAttempts - 1 {
return 0 , false
}
// Check if any credential will recover soon
wait , found := m . closestCooldownWait ( providers , model )
if ! found || wait > maxWait {
return 0 , false
}
return wait , true
}
Retry conditions:
Not the final attempt
At least one credential will recover within maxWait
Error is retryable (408, 429, 500, 502, 503, 504)
Fallback Chains
Automatic fallback when quota is exceeded:
quota-exceeded :
switch-project : true # Try next credential
switch-preview-model : true # Try preview/alternative models
Fallback order:
Try next credential for same model
Try preview model with same credential
Try preview model with next credential
Return cooldown error
Load Balancing
Distribute requests across providers:
openai-compatibility :
# Multiple providers for same models
- name : "groq"
prefix : "groq"
api-key-entries :
- api-key : "gsk-A..."
- api-key : "gsk-B..."
- name : "openrouter"
prefix : "or"
api-key-entries :
- api-key : "sk-or-v1-A..."
- api-key : "sk-or-v1-B..."
With round-robin strategy:
Request 1 → groq key-A
Request 2 → groq key-B
Request 3 → openrouter key-A
Request 4 → openrouter key-B
Request 5 → groq key-A (rotation)
Custom Selectors
Implement custom routing logic:
type MyCustomSelector struct {
// Your state
}
func ( s * MyCustomSelector ) Pick ( ctx context . Context ,
provider , model string ,
opts executor . Options ,
auths [] * Auth ) ( * Auth , error ) {
// Filter by metadata
for _ , auth := range auths {
if region , ok := auth . Metadata [ "region" ].( string ); ok {
if region == "us-west" {
return auth , nil
}
}
}
// Fallback to first
return auths [ 0 ], nil
}
// Register
service . CoreManager (). SetSelector ( & MyCustomSelector {})
Custom selectors receive only available credentials (already filtered by status and cooldown).
Next Steps