feat(skills): modernize capability-writing with Anthropic best practices

Updates capability-writing skill with progressive disclosure structure based on
Anthropic's January 2025 documentation. Implements Haiku-first approach (12x
cheaper, 2-5x faster than Sonnet).

Key changes:
- Add 5 core principles: conciseness, progressive disclosure, script bundling,
  degrees of freedom, and Haiku-first model selection
- Restructure with best-practices.md, templates/, examples/, and reference/
- Create 4 templates: user-invocable skill, background skill, agent, helper script
- Add 3 examples: simple workflow, progressive disclosure, with scripts
- Add 3 reference docs: frontmatter fields, model selection, anti-patterns
- Update create-capability to analyze complexity and recommend structures
- Default all new skills/agents to Haiku unless justified

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-11 18:10:53 +01:00
parent 7406517cd9
commit f424a7f992
13 changed files with 2612 additions and 229 deletions

View File

@@ -0,0 +1,336 @@
# Model Selection Guide
Detailed guidance on choosing the right model for skills and agents.
## Cost Comparison
| Model | Input (per MTok) | Output (per MTok) | vs Haiku |
|-------|------------------|-------------------|----------|
| **Haiku** | $0.25 | $1.25 | Baseline |
| **Sonnet** | $3.00 | $15.00 | 12x more expensive |
| **Opus** | $15.00 | $75.00 | 60x more expensive |
**Example cost for typical skill call (2K input, 1K output):**
- Haiku: $0.00175
- Sonnet: $0.021 (12x more)
- Opus: $0.105 (60x more)
## Speed Comparison
| Model | Tokens/Second | vs Haiku |
|-------|---------------|----------|
| **Haiku** | ~100 | Baseline |
| **Sonnet** | ~40 | 2.5x slower |
| **Opus** | ~20 | 5x slower |
## Decision Framework
```
Start with Haiku by default
|
v
Test on 3-5 representative tasks
|
+-- Success rate ≥80%? ---------> ✓ Use Haiku
| (12x cheaper, 2-5x faster)
|
+-- Success rate <80%? --------> Try Sonnet
| |
| v
| Test on same tasks
| |
| +-- Success ≥80%? --> Use Sonnet
| |
| +-- Still failing? --> Opus or redesign
|
v
Document why you chose the model
```
## When Haiku Works Well
### ✓ Ideal for Haiku
**Simple sequential workflows:**
- `/dashboard` - Fetch and display
- `/roadmap` - List and format
- `/commit` - Generate message from diff
**Workflows with scripts:**
- Error-prone operations in scripts
- Skills just orchestrate script calls
- Validation is deterministic
**Structured outputs:**
- Tasks with clear templates
- Format is defined upfront
- No ambiguous formatting
**Reference/knowledge skills:**
- `gitea` - CLI reference
- `issue-writing` - Patterns and templates
- `software-architecture` - Best practices
### Examples of Haiku Success
**work-issue skill:**
- Sequential steps (view → branch → plan → implement → PR)
- Each step has clear validation
- Scripts handle error-prone operations
- Success rate: ~90%
**dashboard skill:**
- Fetch data (tea commands)
- Format as table
- Clear, structured output
- Success rate: ~95%
## When to Use Sonnet
### Use Sonnet When
**Haiku fails 20%+ of the time**
- Test with Haiku first
- If success rate <80%, upgrade to Sonnet
**Complex judgment required:**
- Code review (quality assessment)
- Issue grooming (clarity evaluation)
- Architecture decisions
**Nuanced reasoning:**
- Understanding implicit requirements
- Making trade-off decisions
- Applying context-dependent rules
### Examples of Sonnet Success
**review-pr skill:**
- Requires code understanding
- Judgment about quality/bugs
- Context-dependent feedback
- Originally tried Haiku: 65% success → Sonnet: 85%
**issue-worker agent:**
- Autonomous implementation
- Pattern matching
- Architectural decisions
- Originally tried Haiku: 70% success → Sonnet: 82%
## When to Use Opus
### Reserve Opus For
**Deep architectural reasoning:**
- `software-architect` agent
- Pattern recognition across large codebases
- Identifying subtle anti-patterns
- Trade-off analysis
**High-stakes decisions:**
- Breaking changes analysis
- System-wide refactoring plans
- Security architecture review
**Complex pattern recognition:**
- Requires sophisticated understanding
- Multiple layers of abstraction
- Long-term implications
### Examples of Opus Success
**software-architect agent:**
- Analyzes entire codebase
- Identifies 8 different anti-patterns
- Provides prioritized recommendations
- Sonnet: 68% success → Opus: 88%
**arch-review-repo skill:**
- Comprehensive architecture audit
- Cross-cutting concerns
- System-wide patterns
- Opus justified for depth
## Making Haiku More Effective
If Haiku is struggling, try these improvements **before** upgrading to Sonnet:
### 1. Add Validation Steps
**Instead of:**
```markdown
3. Implement changes and create PR
```
**Try:**
```markdown
3. Implement changes
4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
5. Create PR: `./scripts/create-pr.sh`
```
### 2. Bundle Error-Prone Operations in Scripts
**Instead of:**
```markdown
5. Create PR: `tea pulls create --title "..." --description "..."`
```
**Try:**
```markdown
5. Create PR: `./scripts/create-pr.sh $issue "$title"`
```
### 3. Add Structured Output Templates
**Instead of:**
```markdown
Show the results
```
**Try:**
```markdown
Format results as:
| Issue | Status | Link |
|-------|--------|------|
| ... | ... | ... |
```
### 4. Add Explicit Checklists
**Instead of:**
```markdown
Review the code for quality
```
**Try:**
```markdown
Check:
- [ ] Code quality (readability, naming)
- [ ] Bugs (edge cases, null checks)
- [ ] Tests (coverage, assertions)
```
### 5. Make Instructions More Concise
**Instead of:**
```markdown
Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...
```
**Try:**
```markdown
`git commit -m 'feat: add feature'`
```
## Testing Methodology
### Create Test Suite
For each skill, create 3-5 test cases:
**Example: work-issue skill tests**
1. Simple bug fix issue
2. New feature with acceptance criteria
3. Issue missing acceptance criteria
4. Issue with tests that fail
5. Complex refactoring task
### Test with Haiku
```bash
# Set skill to Haiku
model: haiku
# Run all 5 tests
# Document success/failure for each
```
### Measure Success Rate
```
Success rate = (Successful tests / Total tests) × 100
```
**Decision:**
- ≥80% → Keep Haiku
- <80% → Try Sonnet
- <50% → Likely need Opus or redesign
### Test with Sonnet (if needed)
```bash
# Upgrade to Sonnet
model: sonnet
# Run same 5 tests
# Compare results
```
### Document Decision
```yaml
---
name: work-issue
model: haiku # Tested: 4/5 tests passed with Haiku (80%)
---
```
Or:
```yaml
---
name: review-pr
model: sonnet # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
---
```
## Common Patterns
### Pattern: Start Haiku, Upgrade if Needed
**Issue-worker agent evolution:**
1. **V1 (Haiku):** 70% success - struggled with pattern matching
2. **Analysis:** Added more examples, still 72%
3. **V2 (Sonnet):** 82% success - better code understanding
4. **Decision:** Keep Sonnet, document why
### Pattern: Haiku for Most, Sonnet for Complex
**Review-pr skill:**
- Static analysis steps: Haiku could handle
- Manual code review: Needs Sonnet judgment
- **Decision:** Use Sonnet for whole skill (simplicity)
### Pattern: Split Complex Skills
**Instead of:** One complex skill using Opus
**Try:** Split into:
- Haiku skill for orchestration
- Sonnet agent for complex subtask
- Saves cost (most work in Haiku)
## Model Selection Checklist
Before choosing a model:
- [ ] Tested with Haiku first
- [ ] Measured success rate on 3-5 test cases
- [ ] Tried improvements (scripts, validation, checklists)
- [ ] Documented why this model is needed
- [ ] Considered cost implications (12x/60x)
- [ ] Considered speed implications (2.5x/5x slower)
- [ ] Will re-test if Claude models improve
## Future-Proofing
**Models improve over time.**
Periodically re-test Sonnet/Opus skills with Haiku:
- Haiku v2 might handle what Haiku v1 couldn't
- Cost savings compound over time
- Speed improvements are valuable
**Set a reminder:** Test Haiku again in 3-6 months.