Files

chore: move agents and skills to old2 folder

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-15 17:28:06 +01:00

7.3 KiB

Raw Blame History

Model Selection Guide

Detailed guidance on choosing the right model for skills and agents.

Cost Comparison

Model	Input (per MTok)	Output (per MTok)	vs Haiku
Haiku	$0.25	$1.25	Baseline
Sonnet	$3.00	$15.00	12x more expensive
Opus	$15.00	$75.00	60x more expensive

Example cost for typical skill call (2K input, 1K output):

Haiku: $0.00175
Sonnet: $0.021 (12x more)
Opus: $0.105 (60x more)

Speed Comparison

Model	Tokens/Second	vs Haiku
Haiku	~100	Baseline
Sonnet	~40	2.5x slower
Opus	~20	5x slower

Decision Framework

Start with Haiku by default
    |
    v
Test on 3-5 representative tasks
    |
    +-- Success rate ≥80%? ---------> ✓ Use Haiku
    |                                  (12x cheaper, 2-5x faster)
    |
    +-- Success rate <80%? --------> Try Sonnet
    |                                    |
    |                                    v
    |                              Test on same tasks
    |                                    |
    |                                    +-- Success ≥80%? --> Use Sonnet
    |                                    |
    |                                    +-- Still failing? --> Opus or redesign
    |
    v
Document why you chose the model

When Haiku Works Well

✓ Ideal for Haiku

Simple sequential workflows:

/dashboard - Fetch and display
/roadmap - List and format
/commit - Generate message from diff

Workflows with scripts:

Error-prone operations in scripts
Skills just orchestrate script calls
Validation is deterministic

Structured outputs:

Tasks with clear templates
Format is defined upfront
No ambiguous formatting

Reference/knowledge skills:

gitea - CLI reference
issue-writing - Patterns and templates
software-architecture - Best practices

Examples of Haiku Success

work-issue skill:

Sequential steps (view → branch → plan → implement → PR)
Each step has clear validation
Scripts handle error-prone operations
Success rate: ~90%

dashboard skill:

Fetch data (tea commands)
Format as table
Clear, structured output
Success rate: ~95%

When to Use Sonnet

Use Sonnet When

Haiku fails 20%+ of the time

Test with Haiku first
If success rate <80%, upgrade to Sonnet

Complex judgment required:

Code review (quality assessment)
Issue grooming (clarity evaluation)
Architecture decisions

Nuanced reasoning:

Understanding implicit requirements
Making trade-off decisions
Applying context-dependent rules

Examples of Sonnet Success

review-pr skill:

Requires code understanding
Judgment about quality/bugs
Context-dependent feedback
Originally tried Haiku: 65% success → Sonnet: 85%

issue-worker agent:

Autonomous implementation
Pattern matching
Architectural decisions
Originally tried Haiku: 70% success → Sonnet: 82%

When to Use Opus

Reserve Opus For

Deep architectural reasoning:

software-architect agent
Pattern recognition across large codebases
Identifying subtle anti-patterns
Trade-off analysis

High-stakes decisions:

Breaking changes analysis
System-wide refactoring plans
Security architecture review

Complex pattern recognition:

Requires sophisticated understanding
Multiple layers of abstraction
Long-term implications

Examples of Opus Success

software-architect agent:

Analyzes entire codebase
Identifies 8 different anti-patterns
Provides prioritized recommendations
Sonnet: 68% success → Opus: 88%

arch-review-repo skill:

Comprehensive architecture audit
Cross-cutting concerns
System-wide patterns
Opus justified for depth

Making Haiku More Effective

If Haiku is struggling, try these improvements before upgrading to Sonnet:

1. Add Validation Steps

Instead of:

3. Implement changes and create PR

Try:

3. Implement changes
4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
5. Create PR: `./scripts/create-pr.sh`

2. Bundle Error-Prone Operations in Scripts

Instead of:

5. Create PR: `tea pulls create --title "..." --description "..."`

Try:

5. Create PR: `./scripts/create-pr.sh $issue "$title"`

3. Add Structured Output Templates

Instead of:

Show the results

Try:

Format results as:

| Issue | Status | Link |
|-------|--------|------|
| ... | ... | ... |

4. Add Explicit Checklists

Instead of:

Review the code for quality

Try:

Check:
- [ ] Code quality (readability, naming)
- [ ] Bugs (edge cases, null checks)
- [ ] Tests (coverage, assertions)

5. Make Instructions More Concise

Instead of:

Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...

Try:

`git commit -m 'feat: add feature'`

Testing Methodology

Create Test Suite

For each skill, create 3-5 test cases:

Example: work-issue skill tests

Simple bug fix issue
New feature with acceptance criteria
Issue missing acceptance criteria
Issue with tests that fail
Complex refactoring task

Test with Haiku

# Set skill to Haiku
model: haiku

# Run all 5 tests
# Document success/failure for each

Measure Success Rate

Success rate = (Successful tests / Total tests) × 100

Decision:

≥80% → Keep Haiku
<80% → Try Sonnet
<50% → Likely need Opus or redesign

Test with Sonnet (if needed)

# Upgrade to Sonnet
model: sonnet

# Run same 5 tests
# Compare results

Document Decision

---
name: work-issue
model: haiku  # Tested: 4/5 tests passed with Haiku (80%)
---

Or:

---
name: review-pr
model: sonnet  # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
---

Common Patterns

Pattern: Start Haiku, Upgrade if Needed

Issue-worker agent evolution:

V1 (Haiku): 70% success - struggled with pattern matching
Analysis: Added more examples, still 72%
V2 (Sonnet): 82% success - better code understanding
Decision: Keep Sonnet, document why

Pattern: Haiku for Most, Sonnet for Complex

Review-pr skill:

Static analysis steps: Haiku could handle
Manual code review: Needs Sonnet judgment
Decision: Use Sonnet for whole skill (simplicity)

Pattern: Split Complex Skills

Instead of: One complex skill using Opus

Try: Split into:

Haiku skill for orchestration
Sonnet agent for complex subtask
Saves cost (most work in Haiku)

Model Selection Checklist

Before choosing a model:

Tested with Haiku first
Measured success rate on 3-5 test cases
Tried improvements (scripts, validation, checklists)
Documented why this model is needed
Considered cost implications (12x/60x)
Considered speed implications (2.5x/5x slower)
Will re-test if Claude models improve

Future-Proofing

Models improve over time.

Periodically re-test Sonnet/Opus skills with Haiku:

Haiku v2 might handle what Haiku v1 couldn't
Cost savings compound over time
Speed improvements are valuable

Set a reminder: Test Haiku again in 3-6 months.

7.3 KiB Raw Blame History Unescape Escape

Model Selection Guide

Cost Comparison

Speed Comparison

Decision Framework

When Haiku Works Well

✓ Ideal for Haiku

Examples of Haiku Success

When to Use Sonnet

Use Sonnet When

Examples of Sonnet Success

When to Use Opus

Reserve Opus For

Examples of Opus Success

Making Haiku More Effective

1. Add Validation Steps

2. Bundle Error-Prone Operations in Scripts

3. Add Structured Output Templates

4. Add Explicit Checklists

5. Make Instructions More Concise

Testing Methodology

Create Test Suite

Test with Haiku

Measure Success Rate

Test with Sonnet (if needed)

Document Decision

Common Patterns

Pattern: Start Haiku, Upgrade if Needed

Pattern: Haiku for Most, Sonnet for Complex

Pattern: Split Complex Skills

Model Selection Checklist

Future-Proofing

7.3 KiB

Raw Blame History