Cortex Router

Cortex Router transforms switchAILocal from a static router into an intelligent orchestrator that automatically selects the optimal model for each request using multi-tier classification and semantic matching.

Overview

Cortex Router uses a four-tier routing architecture to match requests with the best available model:

Reflex Tier

Pattern matching for instant routing (< 1ms)

PII detection → secure models
Code blocks → coding models
Image URLs → vision models

Semantic Tier

Embedding-based intent matching (< 20ms)

Uses local embedding models
Bypasses LLM for high-confidence matches
Matches against 21 pre-built skills

Cognitive Tier

LLM-powered classification (200-500ms)

Uses lightweight router model
Returns confidence scores
Falls back to semantic verification

Cascade Tier

Quality-based model escalation

Detects incomplete responses
Automatically retries with stronger models
Preserves context across attempts

Quick Start

Basic Configuration

Add the intelligence section to your config.yaml:

config.yaml

intelligence:
  enabled: true
  
  # Core routing models
  router-model: "ollama:qwen:0.5b"
  router-fallback: "openai:gpt-4o-mini"
  
  # Intent-to-model mapping
  matrix:
    coding: "switchai-chat"
    reasoning: "switchai-reasoner"
    creative: "switchai-chat"
    fast: "switchai-fast"
    secure: "ollama:llama3.2"  # Local for privacy
    vision: "switchai-chat"

Enable Phase 2 Features

Advanced Phase 2 Configuration

config.yaml

intelligence:
  enabled: true
  
  # Automatic model discovery
  discovery:
    enabled: true
    refresh-interval: 3600  # seconds
    cache-dir: "~/.switchailocal/cache/discovery"
  
  # Local embedding for semantic matching
  embedding:
    enabled: true
    model: "all-MiniLM-L6-v2"
  
  # Semantic tier routing
  semantic-tier:
    enabled: true
    confidence-threshold: 0.85
  
  # Skill-based prompt augmentation
  skills:
    enabled: true
    directory: "plugins/cortex-router/skills"
  
  skill-matching:
    enabled: true
    confidence-threshold: 0.80
  
  # Semantic caching
  semantic-cache:
    enabled: true
    similarity-threshold: 0.95
    max-size: 10000
  
  # Confidence scoring
  confidence:
    enabled: true
  
  # Cross-verification
  verification:
    enabled: true
    confidence-threshold-low: 0.60
    confidence-threshold-high: 0.90
  
  # Automatic quality-based cascading
  cascade:
    enabled: true
    quality-threshold: 0.70
  
  # Feedback collection
  feedback:
    enabled: true
    retention-days: 90

Download Embedding Model

Before using semantic features, download the embedding model:

./scripts/download-embedding-model.sh

Usage

Use model: "auto" or model: "cortex" to enable intelligent routing:

cURL
Python
Node.js

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "auto",
    "messages": [{
      "role": "user",
      "content": "Write a Python function to parse JSON"
    }]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="sk-test-123",
    base_url="http://localhost:18080/v1"
)

response = client.chat.completions.create(
    model="auto",  # Intelligent routing
    messages=[{
        "role": "user",
        "content": "Write a Python function to parse JSON"
    }]
)

print(response.choices[0].message.content)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-test-123',
  baseURL: 'http://localhost:18080/v1'
});

const response = await client.chat.completions.create({
  model: 'auto',  // Intelligent routing
  messages: [{
    role: 'user',
    content: 'Write a Python function to parse JSON'
  }]
});

console.log(response.choices[0].message.content);

Intent Classification

Cortex Router automatically detects request intent and routes to specialized models:

Intent	Description	Example Queries
coding	Code generation, debugging	”Write a Go function”, “Fix this TypeScript error”
reasoning	Complex analysis, math	”Analyze these trends”, “Solve this logic puzzle”
creative	Writing, brainstorming	”Write a blog post”, “Generate product names”
fast	Quick factual questions	”What is the capital of France?”, “Convert 100 USD to EUR”
secure	Sensitive data handling	”Analyze this medical record”, “Review financial data”
vision	Image analysis	”Describe this image”, “Extract text from screenshot”

Dynamic Matrix

Phase 2 introduces automatic model discovery that builds optimal routing tables based on available models:

config.yaml

auto-assign:
  enabled: true
  prefer-local: true       # Prefer local models for 'secure' slot
  cost-optimization: true  # Favor cheaper models when quality is similar
  overrides:
    secure: "ollama:llama3.2"  # Manual override

Capability Scoring

Models are scored and assigned to capability slots:

Capability Slot Scoring Criteria

Slot	Priority Factors
`coding`	Coding capability, context window size, code quality
`reasoning`	Reasoning capability, accuracy, mathematical ability
`creative`	General capability, context window, creativity score
`fast`	Low latency, low cost, acceptable quality
`secure`	Local models preferred, privacy features
`vision`	Vision capability required, image understanding

Pre-Built Skills

Cortex Router includes 21 domain-specific skills that augment prompts with expert instructions:

Coding Skills
Creative Skills
Analysis Skills
Other Skills

api-designer: REST API design, OpenAPI specifications
devops-expert: CI/CD, infrastructure as code, monitoring
docker-expert: Containerization, Dockerfile optimization
frontend-expert: React, TailwindCSS, modern frontend
go-expert: Go/Golang development for switchAILocal
k8s-expert: Kubernetes, Helm, cloud native
mcp-builder: Model Context Protocol server development
python-expert: Python with async, type hints, pytest
typescript-expert: TypeScript type system, advanced patterns
testing-expert: Testing methodologies, TDD, Vitest

Semantic Cache

The semantic cache stores routing decisions based on embedding similarity, enabling sub-millisecond routing for repeated queries:

config.yaml

semantic-cache:
  enabled: true
  similarity-threshold: 0.95  # Cache hit if similarity >= this
  max-size: 10000             # Maximum cache entries

Performance: Cache hits return in < 1ms vs 200-500ms for LLM classification.

Quality-Based Cascading

Cortex automatically escalates to stronger models when response quality is insufficient:

config.yaml

cascade:
  enabled: true
  quality-threshold: 0.70  # Cascade if quality score < this

Cascade Flow

fast → standard → reasoning
  ↓        ↓          ↓
✗ Low   ✗ Low    ✓ Success

Quality signals detected:

Abrupt endings
Missing sections
Incomplete code blocks
Error patterns
Very short responses

Cascading increases cost and latency. Set quality-threshold carefully based on your requirements.

Performance Tuning

Optimize for Speed
Optimize for Quality
Optimize for Cost

config.yaml

intelligence:
  semantic-tier:
    confidence-threshold: 0.80  # Lower = more semantic routing
  semantic-cache:
    enabled: true
    max-size: 50000  # Larger cache
  cascade:
    enabled: false   # Disable for speed

config.yaml

intelligence:
  semantic-tier:
    confidence-threshold: 0.90  # Higher = more LLM verification
  verification:
    enabled: true
  cascade:
    enabled: true
    quality-threshold: 0.80  # Higher = more cascades

config.yaml

intelligence:
  auto-assign:
    cost-optimization: true
  cascade:
    enabled: true  # Start cheap, escalate if needed
  router-model: "ollama:qwen:0.5b"  # Free local model

Management API

Phase 2 adds management endpoints for monitoring and control:

Endpoint	Method	Description
`/v0/management/skills`	GET	List all loaded skills
`/v0/management/feedback`	GET	Get routing feedback statistics
`/v0/management/feedback`	POST	Submit explicit feedback
`/v0/management/steering/reload`	POST	Reload configuration without restart

Troubleshooting

Semantic tier not working

Check embedding model is downloaded:

ls ~/.switchailocal/models/all-MiniLM-L6-v2/

Verify embedding is enabled:
```
embedding:
  enabled: true
```
Check logs for initialization errors

Skills not matching

Verify skills directory exists:
```
ls plugins/cortex-router/skills/
```

Lower the confidence threshold:

skill-matching:
  confidence-threshold: 0.70  # Lower from 0.80

Check skill descriptions are descriptive enough

Cache not helping

Lower similarity threshold for more hits:

semantic-cache:
  similarity-threshold: 0.90  # Lower from 0.95

Increase cache size:

semantic-cache:
  max-size: 50000  # Increase from 10000

Discovery not finding models

Check provider credentials are configured
Verify network connectivity to providers

Check discovery cache directory is writable:

mkdir -p ~/.switchailocal/cache/discovery
chmod 0700 ~/.switchailocal/cache/discovery

Get Started

Core Concepts

Configuration

Intelligent Systems

Advanced Features

Guides

Overview

Quick Start

Basic Configuration

Enable Phase 2 Features

Download Embedding Model

Usage

Intent Classification

Dynamic Matrix

Capability Scoring

Pre-Built Skills

Semantic Cache

Quality-Based Cascading

Cascade Flow

Performance Tuning

Management API

Troubleshooting

Next Steps

Superbrain

Memory System

Get Started

Core Concepts

Configuration

Intelligent Systems

Advanced Features

Guides

​Overview

​Quick Start

​Basic Configuration

​Enable Phase 2 Features

​Download Embedding Model

​Usage

​Intent Classification

​Dynamic Matrix

​Capability Scoring

​Pre-Built Skills

​Semantic Cache

​Quality-Based Cascading

​Cascade Flow

​Performance Tuning

​Management API

​Troubleshooting

​Next Steps

Superbrain

Memory System

Overview

Quick Start

Basic Configuration

Enable Phase 2 Features

Download Embedding Model

Usage

Intent Classification

Dynamic Matrix

Capability Scoring

Pre-Built Skills

Semantic Cache

Quality-Based Cascading

Cascade Flow

Performance Tuning

Management API

Troubleshooting

Next Steps