Overview
switchAILocal provides multiple layers of reliability:- Credential Rotation: Round-robin or fill-first across multiple API keys
- Quota Management: Automatic project switching when quotas are exceeded
- Retry Logic: Configurable retries for transient failures
- Intelligent Failover: AI-powered routing to alternative providers (Superbrain)
Routing Strategies
Control how switchAILocal selects credentials when multiple are available.Configuration
Round-Robin (Default)
Distributes requests evenly across all available credentials:- Request 1 →
key-1 - Request 2 →
key-2 - Request 3 →
key-3 - Request 4 →
key-1(cycles back)
- Even load distribution across accounts
- Fair quota consumption
- Testing multiple credentials
Fill-First
Uses the first credential until it fails or hits quota, then moves to the next:- All requests use
key-1 - On failure/quota → switch to
key-2 - On failure/quota → switch to
key-3 - After recovery, stays on current key
- Primary/backup credential hierarchy
- Minimize account switching
- Cost optimization (cheaper accounts first)
Changing Strategy at Runtime
Changing Strategy at Runtime
You can change the routing strategy at runtime using the Management API:The change takes effect immediately without restart.
Quota Exceeded Handling
Automatic failover when provider quotas are exceeded.Configuration
Project Switching
When a credential hits quota, automatically try the next one:Preview Model Fallback
Automatically downgrade to preview/experimental models when stable models hit quota:- Request for
gemini-2.5-prohits quota - Automatically retry with
gemini-2.5-pro-preview - Client receives response from preview model
Retry Logic
Automatic retries for transient failures.Configuration
Retryable Errors
Requests are automatically retried for these HTTP status codes:- 403 Forbidden (temporary auth issues)
- 408 Request Timeout
- 500 Internal Server Error
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
Behavior
Retry with Credential Rotation
When combined with multiple credentials, retries rotate through them:- Request with
key-1→ 503 - Retry 1 with
key-2→ 503 - Retry 2 with
key-1→ 200 OK ✓
Streaming Failover
Special handling for streaming requests.Configuration
Bootstrap Retries
For streaming requests, switchAILocal can retry before sending any data to the client:- Request starts streaming
- Provider fails before first chunk
- Retry with next credential
- Client never sees the failure
Why Bootstrap Retries?
Why Bootstrap Retries?
Once streaming starts and the first byte is sent to the client, the HTTP response is committed. We can’t retry after that point without breaking the client connection.Bootstrap retries let us retry transparently while the client is still waiting for the first chunk.
SSE Keepalive
Prevent connection timeouts during long streaming responses:0:
Intelligent Failover (Superbrain)
AI-powered provider selection based on capabilities and success rates.Configuration
How It Works
When a provider fails, Superbrain:- Analyzes the failure using AI diagnosis
- Selects an alternative provider based on:
- Provider capabilities (context size, streaming, CLI support)
- Current availability
- Historical success rates
- Request requirements
- Adapts the request for the new provider
- Routes and executes transparently
Provider Selection Criteria
Example Failover Flow
Transparent Metadata
All failover actions are logged in response metadata:Multi-Provider Configuration
Configure multiple providers for comprehensive failover:- Requests round-robin across Gemini keys
- On quota → switch to backup Gemini key
- On failure → retry up to 3 times
- On persistent failure → Superbrain routes to Claude
- On Claude failure → route to local Ollama
Monitoring & Metrics
Management Dashboard
View real-time failover metrics:- Open http://localhost:18080/dashboard
- Navigate to Provider Health
- View:
- Success rates per provider
- Quota status
- Active credential
- Failover events
API Endpoint
Query provider statistics programmatically:Best Practices
1. Use Multiple Credentials Per Provider
2. Enable Quota Switching
3. Configure Reasonable Retries
4. Set Up Multi-Provider Failover
5. Monitor Success Rates
Regularly check provider health:6. Test Failover Scenarios
Simulate failures to verify configuration:Troubleshooting
Failover Not Working
Problem: Requests fail instead of failing over Solutions:- Verify multiple credentials configured:
curl http://localhost:18080/api/providers - Check
quota-exceeded.switch-project: true - Enable debug logging:
debug: true - Verify credentials are valid: Test each manually
Excessive Retries
Problem: Requests take too long due to retries Solutions:- Reduce
request-retryvalue - Decrease
streaming.bootstrap-retries - Check provider health: Remove failing providers
Wrong Credential Used
Problem: Not using expected credential Solutions:- Check routing strategy:
round-robinvsfill-first - View current selection: Dashboard → Provider Health
- Force specific credential using
prefix:modelsyntax
See Also
- Superbrain Intelligence - AI-powered failover
- Configuration Reference - Complete config options
- Provider Setup - Multi-provider configuration