Architecture - switchAILocal

Overview

switchAILocal is a unified AI proxy server that provides OpenAI-compatible API interfaces for multiple AI service providers. It acts as an intelligent gateway that manages authentication, routing, and request translation across CLI tools, cloud APIs, and local models.

Core Components

The architecture is built around several key subsystems that work together to provide a seamless proxy experience:

Service Manager

Orchestrates the complete lifecycle including authentication, file watching, HTTP server, and provider integrations.

Auth Manager

Handles credential management, OAuth flows, API keys, and automatic token refresh for all providers.

Executor Layer

Provider-specific executors that handle request translation and execution against upstream APIs.

Intelligence Service

Powers Cortex Router Phase 2 with semantic matching, intent classification, and dynamic model allocation.

System Flow

The architecture supports hot-reloading of configuration changes without requiring a server restart.

Request Lifecycle

1. Request Reception

The API server (cmd/server/main.go) receives OpenAI-compatible requests:

// Server starts with configuration
func StartService(cfg *config.Config, configPath string, password string)

2. Authentication

The Auth Manager (sdk/switchailocal/auth/conductor.go) orchestrates credential selection:

type Manager struct {
    store     Store
    executors map[string]ProviderExecutor
    selector  Selector
    auths     map[string]*Auth
}

Auth Manager Responsibilities

Credential Selection: Choose appropriate credentials using routing strategies
State Management: Track auth status, quota limits, and cooldown periods
Auto-Refresh: Automatically refresh OAuth tokens before expiration
Failure Handling: Mark credentials unavailable and implement backoff strategies

3. Routing & Execution

The system uses a Selector to pick credentials based on your routing strategy:

type Selector interface {
    Pick(ctx context.Context, provider, model string, 
         opts Options, auths []*Auth) (*Auth, error)
}

Built-in selectors:

RoundRobinSelector: Distributes requests evenly across credentials
FillFirstSelector: Uses first credential until exhausted

4. Provider Execution

Each provider has a dedicated Executor that implements:

type ProviderExecutor interface {
    Identifier() string
    Execute(ctx context.Context, auth *Auth, 
            req Request, opts Options) (Response, error)
    ExecuteStream(ctx context.Context, auth *Auth,
                  req Request, opts Options) (<-chan StreamChunk, error)
    Refresh(ctx context.Context, auth *Auth) (*Auth, error)
}

CLI Executors

GeminiCLIExecutorOllamaExecutorOpenCodeExecutor

Cloud Executors

GeminiExecutorClaudeExecutorCodexExecutor

Compat Executors

OpenAICompatExecutorLMStudioExecutorAntigravityExecutor

Token Translation Pipeline

Requests flow through a translation pipeline (sdk/translator/pipeline.go) that converts between formats:

OpenAI Format → Gemini Format
OpenAI Format → Claude Format  
Gemini Format → OpenAI Format
... and more

The translator supports bidirectional conversion, allowing you to use any API format with any provider.

State Management

Auth State

Each authentication credential tracks its own state:

type Auth struct {
    ID              string
    Provider        string
    Status          Status        // Active, Error, Disabled
    Unavailable     bool
    NextRetryAfter  time.Time
    Quota           QuotaState
    ModelStates     map[string]*ModelState
}

Model-Level State

Fine-grained tracking per model per credential:

type ModelState struct {
    Unavailable     bool
    Status          Status
    NextRetryAfter  time.Time
    Quota           QuotaState
    LastError       *Error
}

Model-level state allows one API key to serve gpt-4 while gpt-3.5-turbo is in cooldown.

Concurrency Model

The system uses Go’s concurrency primitives for safe operation:

type Manager struct {
    mu              sync.RWMutex  // Protects auth map
    auths           map[string]*Auth
    providerOffsets map[string]int // Round-robin state
}

Read locks for credential selection (high throughput)
Write locks only for state updates
Atomic operations for retry counters

Storage Layer

Multiple storage backends for different deployment scenarios:

File Store

Local filesystem storage for single-node deployments

auth-dir: "~/.switchailocal"

Postgres Store

Distributed storage for multi-node deployments

PGSTORE_DSN=postgres://...

Git Store

Version-controlled storage with remote sync

GITSTORE_GIT_URL=https://...

Object Store

S3-compatible storage for cloud deployments

OBJECTSTORE_ENDPOINT=s3.amazonaws.com

Hot-Reload Mechanism

The Config Watcher (internal/watcher/watcher.go) monitors configuration changes:

type Watcher struct {
    configPath string
    authDir    string
    dispatcher *Dispatcher
}

When changes are detected:

Parse new configuration
Calculate diff from current state
Dispatch targeted updates (add/remove/update)
Reload executors without dropping connections

You can add new API keys or update models without restarting the server.

WebSocket Gateway

For real-time streaming providers:

type Manager struct {
    sessions map[string]*Session
    mu       sync.RWMutex
}

Supports long-lived connections for providers like Gemini Live API.

Intelligence Layer (Cortex Phase 2)

When enabled, the Intelligence Service adds:

Semantic Matching: Embed requests and match against known patterns
Intent Classification: Use LLM to classify request intent
Dynamic Routing: Route based on content, not just model name
Skill Augmentation: Inject context from skill definitions

intelligence:
  enabled: true
  router-model: "ollama:gpt-oss:20b-cloud"
  semantic-tier:
    enabled: true
    confidence-threshold: 0.85

Security Architecture

Security is enforced at multiple layers:

Path Validation

func validateFilePath(path string) error {
    // Check for path traversal
    if strings.Contains(path, "..") || strings.Contains(path, "~") {
        return fmt.Errorf("invalid file path")
    }
    // Validate control characters
    // Check absolute paths
}

Error Sanitization

func sanitizeError(err error, context string) error {
    // Remove credentials from error messages
    // Mask tokens and keys
    // Strip sensitive patterns
}

File Permissions

func checkFilePermissions(filePath string) error {
    // Ensure config files are not world-readable
    if mode.Perm()&0077 != 0 {
        return fmt.Errorf("insecure permissions")
    }
}

Extension Points

The architecture supports custom extensions:

Custom Executors

type MyExecutor struct{}

func (e *MyExecutor) Identifier() string { return "my-provider" }
func (e *MyExecutor) Execute(...) (Response, error) { /* ... */ }

// Register with manager
manager.RegisterExecutor(myExecutor)

Custom Selectors

type MySelector struct{}

func (s *MySelector) Pick(ctx, provider, model, opts, auths) (*Auth, error) {
    // Custom selection logic
}

manager.SetSelector(&MySelector{})

Usage Plugins

type MyPlugin struct{}

func (p *MyPlugin) OnRequest(ctx, model, input, output int) {
    // Track usage
}

service.RegisterUsagePlugin(&MyPlugin{})

Performance Considerations

Connection Pooling

HTTP clients use connection pooling for reduced latency

Streaming Support

SSE streaming with heartbeat keepalives

Retry Logic

Exponential backoff with configurable limits

Quota Management

Per-model cooldown to respect rate limits

Next Steps

Providers

Learn about supported providers and how to configure them

Routing

Explore routing strategies and credential selection

Authentication

Understand authentication flows and credential management

Configuration

Complete configuration reference

Get Started

Core Concepts

Configuration

Intelligent Systems

Advanced Features

Guides

​Overview

​Core Components

Service Manager

Auth Manager

Executor Layer

Intelligence Service

​System Flow

​Request Lifecycle

​1. Request Reception

​2. Authentication

​3. Routing & Execution

​4. Provider Execution

CLI Executors

Cloud Executors

Compat Executors

​Token Translation Pipeline

​State Management

​Auth State

​Model-Level State

​Concurrency Model

​Storage Layer

File Store

Postgres Store

Git Store

Object Store

​Hot-Reload Mechanism

​WebSocket Gateway

​Intelligence Layer (Cortex Phase 2)

​Security Architecture

​Path Validation

​Error Sanitization

​File Permissions

​Extension Points

​Custom Executors

​Custom Selectors

​Usage Plugins

​Performance Considerations

Connection Pooling

Streaming Support

Retry Logic

Quota Management

​Next Steps

Providers

Routing

Authentication

Configuration

Overview

Core Components

System Flow

Request Lifecycle

1. Request Reception

2. Authentication

3. Routing & Execution

4. Provider Execution

Token Translation Pipeline

State Management

Auth State

Model-Level State

Concurrency Model

Storage Layer

Hot-Reload Mechanism

WebSocket Gateway

Intelligence Layer (Cortex Phase 2)

Security Architecture

Path Validation

Error Sanitization

File Permissions

Extension Points

Custom Executors

Custom Selectors

Usage Plugins

Performance Considerations

Next Steps