Files
architecture/old2/skills/reference/model-selection.md
2026-01-15 17:28:06 +01:00

7.3 KiB
Raw Blame History

Model Selection Guide

Detailed guidance on choosing the right model for skills and agents.

Cost Comparison

Model Input (per MTok) Output (per MTok) vs Haiku
Haiku $0.25 $1.25 Baseline
Sonnet $3.00 $15.00 12x more expensive
Opus $15.00 $75.00 60x more expensive

Example cost for typical skill call (2K input, 1K output):

  • Haiku: $0.00175
  • Sonnet: $0.021 (12x more)
  • Opus: $0.105 (60x more)

Speed Comparison

Model Tokens/Second vs Haiku
Haiku ~100 Baseline
Sonnet ~40 2.5x slower
Opus ~20 5x slower

Decision Framework

Start with Haiku by default
    |
    v
Test on 3-5 representative tasks
    |
    +-- Success rate ≥80%? ---------> ✓ Use Haiku
    |                                  (12x cheaper, 2-5x faster)
    |
    +-- Success rate <80%? --------> Try Sonnet
    |                                    |
    |                                    v
    |                              Test on same tasks
    |                                    |
    |                                    +-- Success ≥80%? --> Use Sonnet
    |                                    |
    |                                    +-- Still failing? --> Opus or redesign
    |
    v
Document why you chose the model

When Haiku Works Well

✓ Ideal for Haiku

Simple sequential workflows:

  • /dashboard - Fetch and display
  • /roadmap - List and format
  • /commit - Generate message from diff

Workflows with scripts:

  • Error-prone operations in scripts
  • Skills just orchestrate script calls
  • Validation is deterministic

Structured outputs:

  • Tasks with clear templates
  • Format is defined upfront
  • No ambiguous formatting

Reference/knowledge skills:

  • gitea - CLI reference
  • issue-writing - Patterns and templates
  • software-architecture - Best practices

Examples of Haiku Success

work-issue skill:

  • Sequential steps (view → branch → plan → implement → PR)
  • Each step has clear validation
  • Scripts handle error-prone operations
  • Success rate: ~90%

dashboard skill:

  • Fetch data (tea commands)
  • Format as table
  • Clear, structured output
  • Success rate: ~95%

When to Use Sonnet

Use Sonnet When

Haiku fails 20%+ of the time

  • Test with Haiku first
  • If success rate <80%, upgrade to Sonnet

Complex judgment required:

  • Code review (quality assessment)
  • Issue grooming (clarity evaluation)
  • Architecture decisions

Nuanced reasoning:

  • Understanding implicit requirements
  • Making trade-off decisions
  • Applying context-dependent rules

Examples of Sonnet Success

review-pr skill:

  • Requires code understanding
  • Judgment about quality/bugs
  • Context-dependent feedback
  • Originally tried Haiku: 65% success → Sonnet: 85%

issue-worker agent:

  • Autonomous implementation
  • Pattern matching
  • Architectural decisions
  • Originally tried Haiku: 70% success → Sonnet: 82%

When to Use Opus

Reserve Opus For

Deep architectural reasoning:

  • software-architect agent
  • Pattern recognition across large codebases
  • Identifying subtle anti-patterns
  • Trade-off analysis

High-stakes decisions:

  • Breaking changes analysis
  • System-wide refactoring plans
  • Security architecture review

Complex pattern recognition:

  • Requires sophisticated understanding
  • Multiple layers of abstraction
  • Long-term implications

Examples of Opus Success

software-architect agent:

  • Analyzes entire codebase
  • Identifies 8 different anti-patterns
  • Provides prioritized recommendations
  • Sonnet: 68% success → Opus: 88%

arch-review-repo skill:

  • Comprehensive architecture audit
  • Cross-cutting concerns
  • System-wide patterns
  • Opus justified for depth

Making Haiku More Effective

If Haiku is struggling, try these improvements before upgrading to Sonnet:

1. Add Validation Steps

Instead of:

3. Implement changes and create PR

Try:

3. Implement changes
4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
5. Create PR: `./scripts/create-pr.sh`

2. Bundle Error-Prone Operations in Scripts

Instead of:

5. Create PR: `tea pulls create --title "..." --description "..."`

Try:

5. Create PR: `./scripts/create-pr.sh $issue "$title"`

3. Add Structured Output Templates

Instead of:

Show the results

Try:

Format results as:

| Issue | Status | Link |
|-------|--------|------|
| ... | ... | ... |

4. Add Explicit Checklists

Instead of:

Review the code for quality

Try:

Check:
- [ ] Code quality (readability, naming)
- [ ] Bugs (edge cases, null checks)
- [ ] Tests (coverage, assertions)

5. Make Instructions More Concise

Instead of:

Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...

Try:

`git commit -m 'feat: add feature'`

Testing Methodology

Create Test Suite

For each skill, create 3-5 test cases:

Example: work-issue skill tests

  1. Simple bug fix issue
  2. New feature with acceptance criteria
  3. Issue missing acceptance criteria
  4. Issue with tests that fail
  5. Complex refactoring task

Test with Haiku

# Set skill to Haiku
model: haiku

# Run all 5 tests
# Document success/failure for each

Measure Success Rate

Success rate = (Successful tests / Total tests) × 100

Decision:

  • ≥80% → Keep Haiku
  • <80% → Try Sonnet
  • <50% → Likely need Opus or redesign

Test with Sonnet (if needed)

# Upgrade to Sonnet
model: sonnet

# Run same 5 tests
# Compare results

Document Decision

---
name: work-issue
model: haiku  # Tested: 4/5 tests passed with Haiku (80%)
---

Or:

---
name: review-pr
model: sonnet  # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
---

Common Patterns

Pattern: Start Haiku, Upgrade if Needed

Issue-worker agent evolution:

  1. V1 (Haiku): 70% success - struggled with pattern matching
  2. Analysis: Added more examples, still 72%
  3. V2 (Sonnet): 82% success - better code understanding
  4. Decision: Keep Sonnet, document why

Pattern: Haiku for Most, Sonnet for Complex

Review-pr skill:

  • Static analysis steps: Haiku could handle
  • Manual code review: Needs Sonnet judgment
  • Decision: Use Sonnet for whole skill (simplicity)

Pattern: Split Complex Skills

Instead of: One complex skill using Opus

Try: Split into:

  • Haiku skill for orchestration
  • Sonnet agent for complex subtask
  • Saves cost (most work in Haiku)

Model Selection Checklist

Before choosing a model:

  • Tested with Haiku first
  • Measured success rate on 3-5 test cases
  • Tried improvements (scripts, validation, checklists)
  • Documented why this model is needed
  • Considered cost implications (12x/60x)
  • Considered speed implications (2.5x/5x slower)
  • Will re-test if Claude models improve

Future-Proofing

Models improve over time.

Periodically re-test Sonnet/Opus skills with Haiku:

  • Haiku v2 might handle what Haiku v1 couldn't
  • Cost savings compound over time
  • Speed improvements are valuable

Set a reminder: Test Haiku again in 3-6 months.