Overview
This guide demonstrates the most common usage patterns for switchAILocal, from simple chat completions to multi-provider routing.
Simple Chat Completion
The most basic usage - send a message and get a response:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Auto-Routing (No Provider Prefix)
Let switchAILocal automatically select the best available provider:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123",
)
# No prefix = auto-routing to any logged-in provider
completion = client.chat.completions.create(
model="gemini-2.5-pro", # switchAILocal picks: geminicli, gemini API, or switchAI
messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
Auto-routing prioritizes:
- CLI providers (if authenticated)
- API providers (if keys configured)
- Local providers (Ollama, LM Studio)
Explicit Provider Selection
Force routing to a specific provider using prefixes:
Gemini CLI
Ollama (Local)
switchAI Cloud
Claude CLI
completion = client.chat.completions.create(
model="geminicli:gemini-2.5-pro", # Force Gemini CLI
messages=[{"role": "user", "content": "Hello!"}]
)
completion = client.chat.completions.create(
model="ollama:llama3.2", # Force Ollama local model
messages=[{"role": "user", "content": "Hello!"}]
)
completion = client.chat.completions.create(
model="switchai:switchai-fast", # Force switchAI cloud
messages=[{"role": "user", "content": "Hello!"}]
)
completion = client.chat.completions.create(
model="claudecli:claude-sonnet-4", # Force Claude CLI
messages=[{"role": "user", "content": "Hello!"}]
)
List Available Models
Discover all models from all configured providers:
curl http://localhost:18080/v1/models \
-H "Authorization: Bearer sk-test-123"
Example Output:
geminicli:gemini-2.5-pro (google)
ollama:llama3.2 (ollama)
switchai:switchai-fast (traylinx)
switchai:switchai-reasoner (traylinx)
claudecli:claude-sonnet-4 (anthropic)
Multi-turn Conversations
Maintain conversation context across multiple turns:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123",
)
messages = [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate factorial"}
]
# First turn
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages
)
# Add assistant response to history
messages.append({
"role": "assistant",
"content": response.choices[0].message.content
})
# Second turn
messages.append({
"role": "user",
"content": "Now add error handling"
})
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=messages
)
print(response.choices[0].message.content)
Temperature Control
Adjust creativity and randomness:
# Low temperature (0.0-0.3) = Focused, deterministic
code_response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Write a sorting algorithm"}],
temperature=0.2, # Precise, consistent code
)
# High temperature (0.7-1.0) = Creative, varied
story_response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Write a short story"}],
temperature=0.9, # Creative, diverse outputs
)
Max Tokens Limit
Control response length:
completion = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Explain quantum computing"}],
max_tokens=200, # Limit to ~200 tokens (approx 150 words)
)
System Messages
Set the assistant’s behavior and personality:
completion = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "system",
"content": "You are a senior Go developer. Always provide idiomatic Go code with error handling."
},
{
"role": "user",
"content": "Show me how to read a JSON file"
}
]
)
Error Handling
from openai import OpenAI, APIError, APIConnectionError
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123",
)
try:
completion = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Hello!"}]
)
print(completion.choices[0].message.content)
except APIConnectionError as e:
print(f"Connection error: {e}")
except APIError as e:
print(f"API error: {e.status_code} - {e.message}")
Provider Prefix Reference
| Prefix | Provider | Type | Example |
|---|
geminicli: | Google Gemini CLI | CLI Tool | geminicli:gemini-2.5-pro |
claudecli: | Anthropic Claude CLI | CLI Tool | claudecli:claude-sonnet-4 |
codex: | OpenAI Codex CLI | CLI Tool | codex:gpt-4 |
vibe: | Mistral Vibe CLI | CLI Tool | vibe:mistral-large |
ollama: | Ollama | Local | ollama:llama3.2 |
lmstudio: | LM Studio | Local | lmstudio:mistral-7b |
switchai: | Traylinx switchAI | Cloud API | switchai:switchai-fast |
gemini: | Google AI Studio | Cloud API | gemini:gemini-2.5-pro |
claude: | Anthropic API | Cloud API | claude:claude-3-5-sonnet |
openai: | OpenAI API | Cloud API | openai:gpt-4 |
No Prefix = Auto-routing - switchAILocal will intelligently select the best available provider.
Next Steps