Skip to main content

Overview

This guide demonstrates the most common usage patterns for switchAILocal, from simple chat completions to multi-provider routing.

Simple Chat Completion

The most basic usage - send a message and get a response:
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Auto-Routing (No Provider Prefix)

Let switchAILocal automatically select the best available provider:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
)

# No prefix = auto-routing to any logged-in provider
completion = client.chat.completions.create(
    model="gemini-2.5-pro",  # switchAILocal picks: geminicli, gemini API, or switchAI
    messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
Auto-routing prioritizes:
  1. CLI providers (if authenticated)
  2. API providers (if keys configured)
  3. Local providers (Ollama, LM Studio)

Explicit Provider Selection

Force routing to a specific provider using prefixes:
completion = client.chat.completions.create(
    model="geminicli:gemini-2.5-pro",  # Force Gemini CLI
    messages=[{"role": "user", "content": "Hello!"}]
)

List Available Models

Discover all models from all configured providers:
curl http://localhost:18080/v1/models \
  -H "Authorization: Bearer sk-test-123"
Example Output:
geminicli:gemini-2.5-pro (google)
ollama:llama3.2 (ollama)
switchai:switchai-fast (traylinx)
switchai:switchai-reasoner (traylinx)
claudecli:claude-sonnet-4 (anthropic)

Multi-turn Conversations

Maintain conversation context across multiple turns:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
)

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to calculate factorial"}
]

# First turn
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=messages
)

# Add assistant response to history
messages.append({
    "role": "assistant",
    "content": response.choices[0].message.content
})

# Second turn
messages.append({
    "role": "user",
    "content": "Now add error handling"
})

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=messages
)

print(response.choices[0].message.content)

Temperature Control

Adjust creativity and randomness:
# Low temperature (0.0-0.3) = Focused, deterministic
code_response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Write a sorting algorithm"}],
    temperature=0.2,  # Precise, consistent code
)

# High temperature (0.7-1.0) = Creative, varied
story_response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Write a short story"}],
    temperature=0.9,  # Creative, diverse outputs
)

Max Tokens Limit

Control response length:
completion = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    max_tokens=200,  # Limit to ~200 tokens (approx 150 words)
)

System Messages

Set the assistant’s behavior and personality:
completion = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "You are a senior Go developer. Always provide idiomatic Go code with error handling."
        },
        {
            "role": "user",
            "content": "Show me how to read a JSON file"
        }
    ]
)

Error Handling

from openai import OpenAI, APIError, APIConnectionError

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
)

try:
    completion = client.chat.completions.create(
        model="gemini-2.5-pro",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(completion.choices[0].message.content)
except APIConnectionError as e:
    print(f"Connection error: {e}")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")

Provider Prefix Reference

PrefixProviderTypeExample
geminicli:Google Gemini CLICLI Toolgeminicli:gemini-2.5-pro
claudecli:Anthropic Claude CLICLI Toolclaudecli:claude-sonnet-4
codex:OpenAI Codex CLICLI Toolcodex:gpt-4
vibe:Mistral Vibe CLICLI Toolvibe:mistral-large
ollama:OllamaLocalollama:llama3.2
lmstudio:LM StudioLocallmstudio:mistral-7b
switchai:Traylinx switchAICloud APIswitchai:switchai-fast
gemini:Google AI StudioCloud APIgemini:gemini-2.5-pro
claude:Anthropic APICloud APIclaude:claude-3-5-sonnet
openai:OpenAI APICloud APIopenai:gpt-4
No Prefix = Auto-routing - switchAILocal will intelligently select the best available provider.

Next Steps