Overview
Superbrain implements an Observer-Critic architecture where every execution is monitored by a lightweight supervisor that can:- Detect failures in real-time (within seconds, not minutes)
- Diagnose issues using AI (permission prompts, auth errors, context limits)
- Take autonomous healing actions (stdin injection, process restart, intelligent failover)
- Report all actions transparently (healing metadata in responses)
Key Benefits
Real-Time Monitoring
Detects hung processes and silent failures before hard timeouts
AI-Powered Diagnosis
Uses lightweight models to analyze failures and prescribe fixes
Autonomous Healing
Automatically responds to prompts, restarts with flags, or routes to alternatives
Intelligent Failover
Routes failed requests to alternative providers based on capabilities
Context Optimization
Pre-analyzes large requests and optimizes content to fit model limits
Transparent Actions
Every autonomous action is logged and included in response metadata
Quick Start
Basic Configuration
Add thesuperbrain section to your config.yaml:
config.yaml
Operational Modes
Superbrain supports five operational modes for gradual rollout:diagnose
Diagnose failures and log proposed actions without executing them. Validate diagnosis accuracy.
conservative
Heal safe patterns only (whitelisted prompts, known recoverable errors). Recommended for production.
Core Components
1. Overwatch Layer
Real-time monitoring of all CLI executions, tracking output streams and detecting silent hangs.config.yaml
- Heartbeat detection (tracks stdout/stderr activity)
- Rolling log buffer for diagnosis
- Diagnostic snapshot capture on anomalies
- Configurable silence threshold
2. Internal Doctor
AI-powered failure analysis using a lightweight model to identify root causes and recommend fixes.config.yaml
- Permission prompts (file access, tool execution)
- Authentication errors
- Context limit exceeded
- Rate limiting
- Network errors
- Process crashes
3. Stdin Injector
Automatically responds to interactive CLI prompts to prevent processes from hanging.- Conservative Mode
- Autopilot Mode
config.yaml
claude_file_permission: “Allow read file? [y/n]” → “y”claude_tool_permission: “Allow tool execution? [y/n]” → “y”generic_continue: “Press enter to continue” → “\n”
4. Process Recovery
Automatically restarts failed processes with corrective flags based on diagnosis. Example flow:- Diagnosis: “Permission prompt detected”
- Action: Restart with
--dangerously-skip-permissionsflag - Result: Process completes successfully
5. Fallback Router
Routes failed requests to alternative providers based on capabilities and success rates.config.yaml
- Provider capabilities (context size, streaming support)
- Current availability
- Historical success rates
- Request requirements
6. Context Sculptor
Pre-flight analysis and optimization to fit requests within model context limits.config.yaml
- Token estimation for file/folder references
- Intelligent file prioritization
- High-density map generation for excluded content
- Alternative model recommendations
Healing Metadata
When Superbrain takes autonomous actions, responses include detailed healing metadata:- Successful Healing
- Negotiated Failure
Phased Rollout
Deploy Superbrain incrementally to manage risk:Phase 1: Observe Mode (Week 1)
Phase 1: Observe Mode (Week 1)
config.yaml
- Metrics endpoint shows Superbrain data
- Audit log captures all detected issues
- No impact on existing functionality
Phase 2: Diagnose Mode (Week 2)
Phase 2: Diagnose Mode (Week 2)
config.yaml
- Diagnosis correctly identifies failure types
- Proposed actions are appropriate
- No false positives in diagnosis
Phase 3: Conservative Mode (Week 3-4)
Phase 3: Conservative Mode (Week 3-4)
config.yaml
- Success rate improvement (measure before/after)
- No security incidents
- Healing metadata is accurate
Phase 4: Autopilot Mode (Week 5+)
Phase 4: Autopilot Mode (Week 5+)
config.yaml
- Consistent success rate improvement
- Reduced manual intervention
- Stable operation over time
Monitoring & Metrics
Metrics Endpoint
Superbrain exposes metrics via the management API:Audit Log
All autonomous actions are logged in JSON format:Security & Safety
Stdin Injection Whitelist
Only safe patterns are auto-approved by default:claude_file_permission: Allow read file?claude_tool_permission: Allow tool execution?generic_continue: Press enter to continue
Forbidden Patterns
Patterns that will NEVER be auto-approved:config.yaml
Security Fail-Safe
If a security-sensitive operation is detected, Superbrain will:- Abort autonomous remediation
- Return a safe failure response
- Log the incident to the audit log
Emergency Procedures
Emergency Disable
If Superbrain causes issues, disable it immediately:- Configuration File
- Environment Variable
- Mode Change
config.yaml
Rollback Procedure
Troubleshooting
Superbrain not activating
Superbrain not activating
Check:
superbrain.enabled: truein config.yaml- Mode is not
disabled - Provider supports CLI execution (Superbrain only monitors CLI providers)
Too many healing attempts
Too many healing attempts
Solution:
config.yaml
Unwanted stdin injections
Unwanted stdin injections
Solution:Or use conservative mode with explicit whitelist:
config.yaml
config.yaml
Diagnosis taking too long
Diagnosis taking too long
Solution:
config.yaml
Best Practices
Use Conservative Mode in Production
The
conservative mode provides the best balance of automation and safety.Tune Silence Threshold
Adjust based on your typical request latency:
- Fast models (< 10s): 15000ms
- Medium models (10-30s): 30000ms (default)
- Slow models (> 30s): 60000ms