Updates capability-writing skill with progressive disclosure structure based on Anthropic's January 2025 documentation. Implements Haiku-first approach (12x cheaper, 2-5x faster than Sonnet). Key changes: - Add 5 core principles: conciseness, progressive disclosure, script bundling, degrees of freedom, and Haiku-first model selection - Restructure with best-practices.md, templates/, examples/, and reference/ - Create 4 templates: user-invocable skill, background skill, agent, helper script - Add 3 examples: simple workflow, progressive disclosure, with scripts - Add 3 reference docs: frontmatter fields, model selection, anti-patterns - Update create-capability to analyze complexity and recommend structures - Default all new skills/agents to Haiku unless justified Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
337 lines
7.3 KiB
Markdown
337 lines
7.3 KiB
Markdown
# Model Selection Guide
|
||
|
||
Detailed guidance on choosing the right model for skills and agents.
|
||
|
||
## Cost Comparison
|
||
|
||
| Model | Input (per MTok) | Output (per MTok) | vs Haiku |
|
||
|-------|------------------|-------------------|----------|
|
||
| **Haiku** | $0.25 | $1.25 | Baseline |
|
||
| **Sonnet** | $3.00 | $15.00 | 12x more expensive |
|
||
| **Opus** | $15.00 | $75.00 | 60x more expensive |
|
||
|
||
**Example cost for typical skill call (2K input, 1K output):**
|
||
- Haiku: $0.00175
|
||
- Sonnet: $0.021 (12x more)
|
||
- Opus: $0.105 (60x more)
|
||
|
||
## Speed Comparison
|
||
|
||
| Model | Tokens/Second | vs Haiku |
|
||
|-------|---------------|----------|
|
||
| **Haiku** | ~100 | Baseline |
|
||
| **Sonnet** | ~40 | 2.5x slower |
|
||
| **Opus** | ~20 | 5x slower |
|
||
|
||
## Decision Framework
|
||
|
||
```
|
||
Start with Haiku by default
|
||
|
|
||
v
|
||
Test on 3-5 representative tasks
|
||
|
|
||
+-- Success rate ≥80%? ---------> ✓ Use Haiku
|
||
| (12x cheaper, 2-5x faster)
|
||
|
|
||
+-- Success rate <80%? --------> Try Sonnet
|
||
| |
|
||
| v
|
||
| Test on same tasks
|
||
| |
|
||
| +-- Success ≥80%? --> Use Sonnet
|
||
| |
|
||
| +-- Still failing? --> Opus or redesign
|
||
|
|
||
v
|
||
Document why you chose the model
|
||
```
|
||
|
||
## When Haiku Works Well
|
||
|
||
### ✓ Ideal for Haiku
|
||
|
||
**Simple sequential workflows:**
|
||
- `/dashboard` - Fetch and display
|
||
- `/roadmap` - List and format
|
||
- `/commit` - Generate message from diff
|
||
|
||
**Workflows with scripts:**
|
||
- Error-prone operations in scripts
|
||
- Skills just orchestrate script calls
|
||
- Validation is deterministic
|
||
|
||
**Structured outputs:**
|
||
- Tasks with clear templates
|
||
- Format is defined upfront
|
||
- No ambiguous formatting
|
||
|
||
**Reference/knowledge skills:**
|
||
- `gitea` - CLI reference
|
||
- `issue-writing` - Patterns and templates
|
||
- `software-architecture` - Best practices
|
||
|
||
### Examples of Haiku Success
|
||
|
||
**work-issue skill:**
|
||
- Sequential steps (view → branch → plan → implement → PR)
|
||
- Each step has clear validation
|
||
- Scripts handle error-prone operations
|
||
- Success rate: ~90%
|
||
|
||
**dashboard skill:**
|
||
- Fetch data (tea commands)
|
||
- Format as table
|
||
- Clear, structured output
|
||
- Success rate: ~95%
|
||
|
||
## When to Use Sonnet
|
||
|
||
### Use Sonnet When
|
||
|
||
**Haiku fails 20%+ of the time**
|
||
- Test with Haiku first
|
||
- If success rate <80%, upgrade to Sonnet
|
||
|
||
**Complex judgment required:**
|
||
- Code review (quality assessment)
|
||
- Issue grooming (clarity evaluation)
|
||
- Architecture decisions
|
||
|
||
**Nuanced reasoning:**
|
||
- Understanding implicit requirements
|
||
- Making trade-off decisions
|
||
- Applying context-dependent rules
|
||
|
||
### Examples of Sonnet Success
|
||
|
||
**review-pr skill:**
|
||
- Requires code understanding
|
||
- Judgment about quality/bugs
|
||
- Context-dependent feedback
|
||
- Originally tried Haiku: 65% success → Sonnet: 85%
|
||
|
||
**issue-worker agent:**
|
||
- Autonomous implementation
|
||
- Pattern matching
|
||
- Architectural decisions
|
||
- Originally tried Haiku: 70% success → Sonnet: 82%
|
||
|
||
## When to Use Opus
|
||
|
||
### Reserve Opus For
|
||
|
||
**Deep architectural reasoning:**
|
||
- `software-architect` agent
|
||
- Pattern recognition across large codebases
|
||
- Identifying subtle anti-patterns
|
||
- Trade-off analysis
|
||
|
||
**High-stakes decisions:**
|
||
- Breaking changes analysis
|
||
- System-wide refactoring plans
|
||
- Security architecture review
|
||
|
||
**Complex pattern recognition:**
|
||
- Requires sophisticated understanding
|
||
- Multiple layers of abstraction
|
||
- Long-term implications
|
||
|
||
### Examples of Opus Success
|
||
|
||
**software-architect agent:**
|
||
- Analyzes entire codebase
|
||
- Identifies 8 different anti-patterns
|
||
- Provides prioritized recommendations
|
||
- Sonnet: 68% success → Opus: 88%
|
||
|
||
**arch-review-repo skill:**
|
||
- Comprehensive architecture audit
|
||
- Cross-cutting concerns
|
||
- System-wide patterns
|
||
- Opus justified for depth
|
||
|
||
## Making Haiku More Effective
|
||
|
||
If Haiku is struggling, try these improvements **before** upgrading to Sonnet:
|
||
|
||
### 1. Add Validation Steps
|
||
|
||
**Instead of:**
|
||
```markdown
|
||
3. Implement changes and create PR
|
||
```
|
||
|
||
**Try:**
|
||
```markdown
|
||
3. Implement changes
|
||
4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
|
||
5. Create PR: `./scripts/create-pr.sh`
|
||
```
|
||
|
||
### 2. Bundle Error-Prone Operations in Scripts
|
||
|
||
**Instead of:**
|
||
```markdown
|
||
5. Create PR: `tea pulls create --title "..." --description "..."`
|
||
```
|
||
|
||
**Try:**
|
||
```markdown
|
||
5. Create PR: `./scripts/create-pr.sh $issue "$title"`
|
||
```
|
||
|
||
### 3. Add Structured Output Templates
|
||
|
||
**Instead of:**
|
||
```markdown
|
||
Show the results
|
||
```
|
||
|
||
**Try:**
|
||
```markdown
|
||
Format results as:
|
||
|
||
| Issue | Status | Link |
|
||
|-------|--------|------|
|
||
| ... | ... | ... |
|
||
```
|
||
|
||
### 4. Add Explicit Checklists
|
||
|
||
**Instead of:**
|
||
```markdown
|
||
Review the code for quality
|
||
```
|
||
|
||
**Try:**
|
||
```markdown
|
||
Check:
|
||
- [ ] Code quality (readability, naming)
|
||
- [ ] Bugs (edge cases, null checks)
|
||
- [ ] Tests (coverage, assertions)
|
||
```
|
||
|
||
### 5. Make Instructions More Concise
|
||
|
||
**Instead of:**
|
||
```markdown
|
||
Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...
|
||
```
|
||
|
||
**Try:**
|
||
```markdown
|
||
`git commit -m 'feat: add feature'`
|
||
```
|
||
|
||
## Testing Methodology
|
||
|
||
### Create Test Suite
|
||
|
||
For each skill, create 3-5 test cases:
|
||
|
||
**Example: work-issue skill tests**
|
||
1. Simple bug fix issue
|
||
2. New feature with acceptance criteria
|
||
3. Issue missing acceptance criteria
|
||
4. Issue with tests that fail
|
||
5. Complex refactoring task
|
||
|
||
### Test with Haiku
|
||
|
||
```bash
|
||
# Set skill to Haiku
|
||
model: haiku
|
||
|
||
# Run all 5 tests
|
||
# Document success/failure for each
|
||
```
|
||
|
||
### Measure Success Rate
|
||
|
||
```
|
||
Success rate = (Successful tests / Total tests) × 100
|
||
```
|
||
|
||
**Decision:**
|
||
- ≥80% → Keep Haiku
|
||
- <80% → Try Sonnet
|
||
- <50% → Likely need Opus or redesign
|
||
|
||
### Test with Sonnet (if needed)
|
||
|
||
```bash
|
||
# Upgrade to Sonnet
|
||
model: sonnet
|
||
|
||
# Run same 5 tests
|
||
# Compare results
|
||
```
|
||
|
||
### Document Decision
|
||
|
||
```yaml
|
||
---
|
||
name: work-issue
|
||
model: haiku # Tested: 4/5 tests passed with Haiku (80%)
|
||
---
|
||
```
|
||
|
||
Or:
|
||
|
||
```yaml
|
||
---
|
||
name: review-pr
|
||
model: sonnet # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
|
||
---
|
||
```
|
||
|
||
## Common Patterns
|
||
|
||
### Pattern: Start Haiku, Upgrade if Needed
|
||
|
||
**Issue-worker agent evolution:**
|
||
1. **V1 (Haiku):** 70% success - struggled with pattern matching
|
||
2. **Analysis:** Added more examples, still 72%
|
||
3. **V2 (Sonnet):** 82% success - better code understanding
|
||
4. **Decision:** Keep Sonnet, document why
|
||
|
||
### Pattern: Haiku for Most, Sonnet for Complex
|
||
|
||
**Review-pr skill:**
|
||
- Static analysis steps: Haiku could handle
|
||
- Manual code review: Needs Sonnet judgment
|
||
- **Decision:** Use Sonnet for whole skill (simplicity)
|
||
|
||
### Pattern: Split Complex Skills
|
||
|
||
**Instead of:** One complex skill using Opus
|
||
|
||
**Try:** Split into:
|
||
- Haiku skill for orchestration
|
||
- Sonnet agent for complex subtask
|
||
- Saves cost (most work in Haiku)
|
||
|
||
## Model Selection Checklist
|
||
|
||
Before choosing a model:
|
||
|
||
- [ ] Tested with Haiku first
|
||
- [ ] Measured success rate on 3-5 test cases
|
||
- [ ] Tried improvements (scripts, validation, checklists)
|
||
- [ ] Documented why this model is needed
|
||
- [ ] Considered cost implications (12x/60x)
|
||
- [ ] Considered speed implications (2.5x/5x slower)
|
||
- [ ] Will re-test if Claude models improve
|
||
|
||
## Future-Proofing
|
||
|
||
**Models improve over time.**
|
||
|
||
Periodically re-test Sonnet/Opus skills with Haiku:
|
||
- Haiku v2 might handle what Haiku v1 couldn't
|
||
- Cost savings compound over time
|
||
- Speed improvements are valuable
|
||
|
||
**Set a reminder:** Test Haiku again in 3-6 months.
|