7.3 KiB
Model Selection Guide
Detailed guidance on choosing the right model for skills and agents.
Cost Comparison
| Model | Input (per MTok) | Output (per MTok) | vs Haiku |
|---|---|---|---|
| Haiku | $0.25 | $1.25 | Baseline |
| Sonnet | $3.00 | $15.00 | 12x more expensive |
| Opus | $15.00 | $75.00 | 60x more expensive |
Example cost for typical skill call (2K input, 1K output):
- Haiku: $0.00175
- Sonnet: $0.021 (12x more)
- Opus: $0.105 (60x more)
Speed Comparison
| Model | Tokens/Second | vs Haiku |
|---|---|---|
| Haiku | ~100 | Baseline |
| Sonnet | ~40 | 2.5x slower |
| Opus | ~20 | 5x slower |
Decision Framework
Start with Haiku by default
|
v
Test on 3-5 representative tasks
|
+-- Success rate ≥80%? ---------> ✓ Use Haiku
| (12x cheaper, 2-5x faster)
|
+-- Success rate <80%? --------> Try Sonnet
| |
| v
| Test on same tasks
| |
| +-- Success ≥80%? --> Use Sonnet
| |
| +-- Still failing? --> Opus or redesign
|
v
Document why you chose the model
When Haiku Works Well
✓ Ideal for Haiku
Simple sequential workflows:
/dashboard- Fetch and display/roadmap- List and format/commit- Generate message from diff
Workflows with scripts:
- Error-prone operations in scripts
- Skills just orchestrate script calls
- Validation is deterministic
Structured outputs:
- Tasks with clear templates
- Format is defined upfront
- No ambiguous formatting
Reference/knowledge skills:
gitea- CLI referenceissue-writing- Patterns and templatessoftware-architecture- Best practices
Examples of Haiku Success
work-issue skill:
- Sequential steps (view → branch → plan → implement → PR)
- Each step has clear validation
- Scripts handle error-prone operations
- Success rate: ~90%
dashboard skill:
- Fetch data (tea commands)
- Format as table
- Clear, structured output
- Success rate: ~95%
When to Use Sonnet
Use Sonnet When
Haiku fails 20%+ of the time
- Test with Haiku first
- If success rate <80%, upgrade to Sonnet
Complex judgment required:
- Code review (quality assessment)
- Issue grooming (clarity evaluation)
- Architecture decisions
Nuanced reasoning:
- Understanding implicit requirements
- Making trade-off decisions
- Applying context-dependent rules
Examples of Sonnet Success
review-pr skill:
- Requires code understanding
- Judgment about quality/bugs
- Context-dependent feedback
- Originally tried Haiku: 65% success → Sonnet: 85%
issue-worker agent:
- Autonomous implementation
- Pattern matching
- Architectural decisions
- Originally tried Haiku: 70% success → Sonnet: 82%
When to Use Opus
Reserve Opus For
Deep architectural reasoning:
software-architectagent- Pattern recognition across large codebases
- Identifying subtle anti-patterns
- Trade-off analysis
High-stakes decisions:
- Breaking changes analysis
- System-wide refactoring plans
- Security architecture review
Complex pattern recognition:
- Requires sophisticated understanding
- Multiple layers of abstraction
- Long-term implications
Examples of Opus Success
software-architect agent:
- Analyzes entire codebase
- Identifies 8 different anti-patterns
- Provides prioritized recommendations
- Sonnet: 68% success → Opus: 88%
arch-review-repo skill:
- Comprehensive architecture audit
- Cross-cutting concerns
- System-wide patterns
- Opus justified for depth
Making Haiku More Effective
If Haiku is struggling, try these improvements before upgrading to Sonnet:
1. Add Validation Steps
Instead of:
3. Implement changes and create PR
Try:
3. Implement changes
4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
5. Create PR: `./scripts/create-pr.sh`
2. Bundle Error-Prone Operations in Scripts
Instead of:
5. Create PR: `tea pulls create --title "..." --description "..."`
Try:
5. Create PR: `./scripts/create-pr.sh $issue "$title"`
3. Add Structured Output Templates
Instead of:
Show the results
Try:
Format results as:
| Issue | Status | Link |
|-------|--------|------|
| ... | ... | ... |
4. Add Explicit Checklists
Instead of:
Review the code for quality
Try:
Check:
- [ ] Code quality (readability, naming)
- [ ] Bugs (edge cases, null checks)
- [ ] Tests (coverage, assertions)
5. Make Instructions More Concise
Instead of:
Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...
Try:
`git commit -m 'feat: add feature'`
Testing Methodology
Create Test Suite
For each skill, create 3-5 test cases:
Example: work-issue skill tests
- Simple bug fix issue
- New feature with acceptance criteria
- Issue missing acceptance criteria
- Issue with tests that fail
- Complex refactoring task
Test with Haiku
# Set skill to Haiku
model: haiku
# Run all 5 tests
# Document success/failure for each
Measure Success Rate
Success rate = (Successful tests / Total tests) × 100
Decision:
- ≥80% → Keep Haiku
- <80% → Try Sonnet
- <50% → Likely need Opus or redesign
Test with Sonnet (if needed)
# Upgrade to Sonnet
model: sonnet
# Run same 5 tests
# Compare results
Document Decision
---
name: work-issue
model: haiku # Tested: 4/5 tests passed with Haiku (80%)
---
Or:
---
name: review-pr
model: sonnet # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
---
Common Patterns
Pattern: Start Haiku, Upgrade if Needed
Issue-worker agent evolution:
- V1 (Haiku): 70% success - struggled with pattern matching
- Analysis: Added more examples, still 72%
- V2 (Sonnet): 82% success - better code understanding
- Decision: Keep Sonnet, document why
Pattern: Haiku for Most, Sonnet for Complex
Review-pr skill:
- Static analysis steps: Haiku could handle
- Manual code review: Needs Sonnet judgment
- Decision: Use Sonnet for whole skill (simplicity)
Pattern: Split Complex Skills
Instead of: One complex skill using Opus
Try: Split into:
- Haiku skill for orchestration
- Sonnet agent for complex subtask
- Saves cost (most work in Haiku)
Model Selection Checklist
Before choosing a model:
- Tested with Haiku first
- Measured success rate on 3-5 test cases
- Tried improvements (scripts, validation, checklists)
- Documented why this model is needed
- Considered cost implications (12x/60x)
- Considered speed implications (2.5x/5x slower)
- Will re-test if Claude models improve
Future-Proofing
Models improve over time.
Periodically re-test Sonnet/Opus skills with Haiku:
- Haiku v2 might handle what Haiku v1 couldn't
- Cost savings compound over time
- Speed improvements are valuable
Set a reminder: Test Haiku again in 3-6 months.