Skip to main content

Overview

Providers are the backbone of switchAILocal, enabling access to various AI services through a unified interface. Each provider type has unique characteristics and authentication requirements.

Provider Types

switchAILocal supports three distinct categories of providers:

CLI Providers

Local command-line tools running on your machine

Cloud Providers

Remote API services accessed via HTTP

OAuth Providers

Services requiring OAuth2 authentication flows

CLI Providers

CLI providers execute locally-installed AI tools and expose them through the proxy.

Ollama

Local model server for running open-source LLMs.
ollama:
  enabled: true
  base-url: "http://localhost:11434"
  auto-discover: true  # Automatically fetch available models
The OllamaExecutor (internal/runtime/executor/ollama_executor.go) translates OpenAI format to Ollama’s native API:
type OllamaExecutor struct {
    cfg     *config.Config
    baseURL string
    client  *http.Client
}
Key features:
  • Auto-discovery: Queries /api/tags for available models
  • Vision support: Handles base64-encoded images
  • Streaming: Real-time response chunks
  • No authentication: Direct HTTP access
Example models:
  • ollama:llama3.2
  • ollama:mistral
  • ollama:qwen3-vl:235b-instruct-cloud

OpenCode

Local AI coding agent for development tasks.
opencode:
  enabled: true
  base-url: "http://localhost:4096"
  default-agent: "build"
OpenCode agents are specialized for different tasks: build, debug, test, refactor.

LM Studio

GUI application for running local models with OpenAI-compatible API.
lmstudio:
  enabled: true
  base-url: "http://localhost:1234/v1"
  auto-discover: true

Gemini CLI

Access Google’s Gemini via OAuth credentials without API keys.
# Use OAuth login instead of API keys
# Run: switchAILocal -login
The GeminiCLIExecutor uses OAuth2 with device flow:
type GeminiCLIExecutor struct {
    cfg *config.Config
}

func prepareGeminiCLITokenSource(ctx, cfg, auth) (TokenSource, error) {
    // Exchange refresh token for access token
    // Endpoint: https://cloudcode-pa.googleapis.com
}
Scopes required:
  • https://www.googleapis.com/auth/cloud-platform
  • https://www.googleapis.com/auth/userinfo.email

Cloud Providers

Cloud providers access remote APIs using API keys or OAuth tokens.

Gemini API

Google’s Gemini models via REST API.
gemini-api-key:
  - api-key: "AIzaSy..."
    prefix: "google"
    base-url: "https://generativelanguage.googleapis.com"
Supported endpoints:
  • /v1beta/models/{model}:generateContent
  • /v1beta/models/{model}:streamGenerateContent
  • /v1beta/models/{model}:countTokens
Use the prefix to disambiguate when you have multiple Gemini providers configured. Access as google/gemini-2.0-flash.

Claude (Anthropic)

Anthropic’s Claude models.
claude-api-key:
  - api-key: "sk-ant-..."
    models:
      - name: "claude-3-5-sonnet-20241022"
        alias: "sonnet"
      - name: "claude-3-opus-20240229"
        alias: "opus"
The ClaudeExecutor translates between OpenAI and Claude formats:
// OpenAI format
{
  "model": "gpt-4",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Claude format (translated)
{
  "model": "claude-3-5-sonnet-20241022",
  "messages": [{"role": "user", "content": "Hello"}],
  "max_tokens": 4096
}
Note: Claude requires explicit max_tokens parameter.

OpenAI / Codex

OpenAI’s GPT models.
codex-api-key:
  - api-key: "sk-..."
    base-url: "https://api.openai.com/v1"
Available models:
  • GPT-4 series: gpt-4, gpt-4-turbo, gpt-4o
  • GPT-3.5 series: gpt-3.5-turbo
  • O-series: o1, o1-mini, o1-preview

SwitchAI Cloud

Unified access to 100+ cloud models through single API key.
switchai-api-key:
  - api-key: "sk-lf-..."
    base-url: "https://switchai.traylinx.com/v1"
    models:
      - name: "openai/gpt-oss-120b"
        alias: "switchai-fast"
      - name: "deepseek-reasoner"
        alias: "switchai-reasoner"
SwitchAI provides access to models from OpenAI, Anthropic, Google, DeepSeek, and more through a single endpoint.

OpenAI-Compatible Providers

Many providers offer OpenAI-compatible endpoints that can be configured using openai-compatibility.
openai-compatibility:
  - name: "groq"
    prefix: "groq"
    base-url: "https://api.groq.com/openai/v1"
    api-key-entries:
      - api-key: "gsk_..."
  
  - name: "openrouter"
    prefix: "or"
    base-url: "https://openrouter.ai/api/v1"
    api-key-entries:
      - api-key: "sk-or-v1-..."
  
  - name: "together"
    prefix: "together"
    base-url: "https://api.together.xyz/v1"
    api-key-entries:
      - api-key: "..."
The OpenAICompatExecutor is a generic executor for OpenAI-compatible APIs:
type OpenAICompatExecutor struct {
    provider string  // e.g., "groq", "openrouter"
    cfg      *config.Config
}

func (e *OpenAICompatExecutor) Identifier() string {
    return e.provider
}
Supports:
  • Chat completions: /chat/completions
  • Image generation: /images/generations
  • Audio transcription: /audio/transcriptions
  • Audio speech: /audio/speech

Supported Services

Groq

Ultra-fast inference with LPU technologyModels: Llama, Mixtral, Gemma

OpenRouter

Access to 100+ models with unified billingModels: GPT-4, Claude, Gemini, and more

Together AI

Open-source models with fast inferenceModels: Llama, Mistral, Qwen

Anyscale

Serverless endpoints for OSS modelsModels: Llama, Mixtral, CodeLlama

Provider Lifecycle

Registration

Providers are registered during service initialization:
// Register executors in service builder
func (b *ServiceBuilder) Build() (*Service, error) {
    // Register Gemini CLI
    geminiCLIExec := executor.NewGeminiCLIExecutor(b.cfg)
    b.coreManager.RegisterExecutor(geminiCLIExec)
    
    // Register Ollama
    ollamaExec := executor.NewOllamaExecutor(b.cfg)
    b.coreManager.RegisterExecutor(ollamaExec)
    
    // Register OpenAI compat providers
    for _, compatCfg := range b.cfg.OpenAICompatibility {
        exec := executor.NewOpenAICompatExecutor(compatCfg.Name, b.cfg)
        b.coreManager.RegisterExecutor(exec)
    }
}

Discovery

Providers with auto-discover: true fetch available models:
// Ollama auto-discovery
func (e *OllamaExecutor) DiscoverModels() ([]string, error) {
    resp, err := e.client.Get(e.baseURL + "/api/tags")
    // Parse and return model list
}

Health Checking

Providers can implement health checks:
type ProviderHealthChecker interface {
    CheckHealth(ctx context.Context) error
}

Model Naming Conventions

Providers use prefixes to avoid naming conflicts:
# Without prefix (uses first matching provider)
model: gpt-4o

# With provider prefix
model: openrouter/gpt-4o
model: groq/llama-3.2-90b
model: ollama:mistral

# With custom alias
model: sonnet  # Maps to claude-3-5-sonnet-20241022
Use force-model-prefix: true in config to require explicit prefixes for all requests.

Provider Metadata

Each provider auth includes metadata:
type Auth struct {
    Provider   string
    Metadata   map[string]any
    Attributes map[string]string
}

// Example metadata
metadata: {
    "source": "config_yaml",
    "base_url": "https://api.groq.com/openai/v1",
    "prefix": "groq",
    "auto_discover": true
}

Proxy Configuration

Providers can use HTTP proxies:
# Global proxy for all providers
proxy-url: "socks5://user:pass@192.168.1.1:1080/"

# Per-provider proxy
gemini-api-key:
  - api-key: "AIzaSy..."
    proxy-url: "http://proxy.example.com:8080"
Supported proxy protocols: http://, https://, socks5://

Error Handling

Providers implement standardized error responses:
type Error struct {
    Code       string  // "auth_failed", "quota_exceeded", etc.
    Message    string
    HTTPStatus int     // 401, 429, 500, etc.
    Retryable  bool
}
Common error codes:
  • 401: Invalid API key or expired token
  • 402/403: Payment required or forbidden
  • 404: Model not found
  • 429: Rate limit exceeded (triggers cooldown)
  • 500/502/503/504: Transient server errors

Provider Quotas

Quota management prevents excessive retry storms:
type QuotaState struct {
    Exceeded      bool
    Reason        string
    NextRecoverAt time.Time
    BackoffLevel  int  // Exponential backoff level
}
Backoff schedule:
  • Level 0: 1 second
  • Level 1: 2 seconds
  • Level 2: 4 seconds
  • Max: 30 minutes
When a provider returns 429 Too Many Requests:
  1. Mark the credential as quota-exceeded
  2. Set NextRecoverAt based on Retry-After header or exponential backoff
  3. Skip this credential during selection until recovery time
  4. Reset quota state on successful request
case 429:
    var next time.Time
    if result.RetryAfter != nil {
        next = now.Add(*result.RetryAfter)
    } else {
        cooldown, nextLevel := nextQuotaCooldown(backoffLevel)
        next = now.Add(cooldown)
        backoffLevel = nextLevel
    }
    state.Quota = QuotaState{
        Exceeded:      true,
        NextRecoverAt: next,
        BackoffLevel:  backoffLevel,
    }

Next Steps