# Model Selection Guide Detailed guidance on choosing the right model for skills and agents. ## Cost Comparison | Model | Input (per MTok) | Output (per MTok) | vs Haiku | |-------|------------------|-------------------|----------| | **Haiku** | $0.25 | $1.25 | Baseline | | **Sonnet** | $3.00 | $15.00 | 12x more expensive | | **Opus** | $15.00 | $75.00 | 60x more expensive | **Example cost for typical skill call (2K input, 1K output):** - Haiku: $0.00175 - Sonnet: $0.021 (12x more) - Opus: $0.105 (60x more) ## Speed Comparison | Model | Tokens/Second | vs Haiku | |-------|---------------|----------| | **Haiku** | ~100 | Baseline | | **Sonnet** | ~40 | 2.5x slower | | **Opus** | ~20 | 5x slower | ## Decision Framework ``` Start with Haiku by default | v Test on 3-5 representative tasks | +-- Success rate ≥80%? ---------> ✓ Use Haiku | (12x cheaper, 2-5x faster) | +-- Success rate <80%? --------> Try Sonnet | | | v | Test on same tasks | | | +-- Success ≥80%? --> Use Sonnet | | | +-- Still failing? --> Opus or redesign | v Document why you chose the model ``` ## When Haiku Works Well ### ✓ Ideal for Haiku **Simple sequential workflows:** - `/dashboard` - Fetch and display - `/roadmap` - List and format - `/commit` - Generate message from diff **Workflows with scripts:** - Error-prone operations in scripts - Skills just orchestrate script calls - Validation is deterministic **Structured outputs:** - Tasks with clear templates - Format is defined upfront - No ambiguous formatting **Reference/knowledge skills:** - `gitea` - CLI reference - `issue-writing` - Patterns and templates - `software-architecture` - Best practices ### Examples of Haiku Success **work-issue skill:** - Sequential steps (view → branch → plan → implement → PR) - Each step has clear validation - Scripts handle error-prone operations - Success rate: ~90% **dashboard skill:** - Fetch data (tea commands) - Format as table - Clear, structured output - Success rate: ~95% ## When to Use Sonnet ### Use Sonnet When **Haiku fails 20%+ of the time** - Test with Haiku first - If success rate <80%, upgrade to Sonnet **Complex judgment required:** - Code review (quality assessment) - Issue grooming (clarity evaluation) - Architecture decisions **Nuanced reasoning:** - Understanding implicit requirements - Making trade-off decisions - Applying context-dependent rules ### Examples of Sonnet Success **review-pr skill:** - Requires code understanding - Judgment about quality/bugs - Context-dependent feedback - Originally tried Haiku: 65% success → Sonnet: 85% **issue-worker agent:** - Autonomous implementation - Pattern matching - Architectural decisions - Originally tried Haiku: 70% success → Sonnet: 82% ## When to Use Opus ### Reserve Opus For **Deep architectural reasoning:** - `software-architect` agent - Pattern recognition across large codebases - Identifying subtle anti-patterns - Trade-off analysis **High-stakes decisions:** - Breaking changes analysis - System-wide refactoring plans - Security architecture review **Complex pattern recognition:** - Requires sophisticated understanding - Multiple layers of abstraction - Long-term implications ### Examples of Opus Success **software-architect agent:** - Analyzes entire codebase - Identifies 8 different anti-patterns - Provides prioritized recommendations - Sonnet: 68% success → Opus: 88% **arch-review-repo skill:** - Comprehensive architecture audit - Cross-cutting concerns - System-wide patterns - Opus justified for depth ## Making Haiku More Effective If Haiku is struggling, try these improvements **before** upgrading to Sonnet: ### 1. Add Validation Steps **Instead of:** ```markdown 3. Implement changes and create PR ``` **Try:** ```markdown 3. Implement changes 4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean) 5. Create PR: `./scripts/create-pr.sh` ``` ### 2. Bundle Error-Prone Operations in Scripts **Instead of:** ```markdown 5. Create PR: `tea pulls create --title "..." --description "..."` ``` **Try:** ```markdown 5. Create PR: `./scripts/create-pr.sh $issue "$title"` ``` ### 3. Add Structured Output Templates **Instead of:** ```markdown Show the results ``` **Try:** ```markdown Format results as: | Issue | Status | Link | |-------|--------|------| | ... | ... | ... | ``` ### 4. Add Explicit Checklists **Instead of:** ```markdown Review the code for quality ``` **Try:** ```markdown Check: - [ ] Code quality (readability, naming) - [ ] Bugs (edge cases, null checks) - [ ] Tests (coverage, assertions) ``` ### 5. Make Instructions More Concise **Instead of:** ```markdown Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository... ``` **Try:** ```markdown `git commit -m 'feat: add feature'` ``` ## Testing Methodology ### Create Test Suite For each skill, create 3-5 test cases: **Example: work-issue skill tests** 1. Simple bug fix issue 2. New feature with acceptance criteria 3. Issue missing acceptance criteria 4. Issue with tests that fail 5. Complex refactoring task ### Test with Haiku ```bash # Set skill to Haiku model: haiku # Run all 5 tests # Document success/failure for each ``` ### Measure Success Rate ``` Success rate = (Successful tests / Total tests) × 100 ``` **Decision:** - ≥80% → Keep Haiku - <80% → Try Sonnet - <50% → Likely need Opus or redesign ### Test with Sonnet (if needed) ```bash # Upgrade to Sonnet model: sonnet # Run same 5 tests # Compare results ``` ### Document Decision ```yaml --- name: work-issue model: haiku # Tested: 4/5 tests passed with Haiku (80%) --- ``` Or: ```yaml --- name: review-pr model: sonnet # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%) --- ``` ## Common Patterns ### Pattern: Start Haiku, Upgrade if Needed **Issue-worker agent evolution:** 1. **V1 (Haiku):** 70% success - struggled with pattern matching 2. **Analysis:** Added more examples, still 72% 3. **V2 (Sonnet):** 82% success - better code understanding 4. **Decision:** Keep Sonnet, document why ### Pattern: Haiku for Most, Sonnet for Complex **Review-pr skill:** - Static analysis steps: Haiku could handle - Manual code review: Needs Sonnet judgment - **Decision:** Use Sonnet for whole skill (simplicity) ### Pattern: Split Complex Skills **Instead of:** One complex skill using Opus **Try:** Split into: - Haiku skill for orchestration - Sonnet agent for complex subtask - Saves cost (most work in Haiku) ## Model Selection Checklist Before choosing a model: - [ ] Tested with Haiku first - [ ] Measured success rate on 3-5 test cases - [ ] Tried improvements (scripts, validation, checklists) - [ ] Documented why this model is needed - [ ] Considered cost implications (12x/60x) - [ ] Considered speed implications (2.5x/5x slower) - [ ] Will re-test if Claude models improve ## Future-Proofing **Models improve over time.** Periodically re-test Sonnet/Opus skills with Haiku: - Haiku v2 might handle what Haiku v1 couldn't - Cost savings compound over time - Speed improvements are valuable **Set a reminder:** Test Haiku again in 3-6 months.