feat(skills): modernize capability-writing with Anthropic best practices

Updates capability-writing skill with progressive disclosure structure based on Anthropic's January 2025 documentation. Implements Haiku-first approach (12x cheaper, 2-5x faster than Sonnet). Key changes: - Add 5 core principles: conciseness, progressive disclosure, script bundling, degrees of freedom, and Haiku-first model selection - Restructure with best-practices.md, templates/, examples/, and reference/ - Create 4 templates: user-invocable skill, background skill, agent, helper script - Add 3 examples: simple workflow, progressive disclosure, with scripts - Add 3 reference docs: frontmatter fields, model selection, anti-patterns - Update create-capability to analyze complexity and recommend structures - Default all new skills/agents to Haiku unless justified Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-11 18:10:53 +01:00
parent 7406517cd9
commit f424a7f992
13 changed files with 2612 additions and 229 deletions
--- a/skills/capability-writing/reference/model-selection.md
+++ b/skills/capability-writing/reference/model-selection.md
@@ -0,0 +1,336 @@
+# Model Selection Guide
+
+Detailed guidance on choosing the right model for skills and agents.
+
+## Cost Comparison
+
+| Model | Input (per MTok) | Output (per MTok) | vs Haiku |
+|-------|------------------|-------------------|----------|
+| **Haiku** | $0.25 | $1.25 | Baseline |
+| **Sonnet** | $3.00 | $15.00 | 12x more expensive |
+| **Opus** | $15.00 | $75.00 | 60x more expensive |
+
+**Example cost for typical skill call (2K input, 1K output):**
+- Haiku: $0.00175
+- Sonnet: $0.021 (12x more)
+- Opus: $0.105 (60x more)
+
+## Speed Comparison
+
+| Model | Tokens/Second | vs Haiku |
+|-------|---------------|----------|
+| **Haiku** | ~100 | Baseline |
+| **Sonnet** | ~40 | 2.5x slower |
+| **Opus** | ~20 | 5x slower |
+
+## Decision Framework
+
+```
+Start with Haiku by default
+    |
+    v
+Test on 3-5 representative tasks
+    |
+    +-- Success rate ≥80%? ---------> ✓ Use Haiku
+    |                                  (12x cheaper, 2-5x faster)
+    |
+    +-- Success rate <80%? --------> Try Sonnet
+    |                                    |
+    |                                    v
+    |                              Test on same tasks
+    |                                    |
+    |                                    +-- Success ≥80%? --> Use Sonnet
+    |                                    |
+    |                                    +-- Still failing? --> Opus or redesign
+    |
+    v
+Document why you chose the model
+```
+
+## When Haiku Works Well
+
+### ✓ Ideal for Haiku
+
+**Simple sequential workflows:**
+- `/dashboard` - Fetch and display
+- `/roadmap` - List and format
+- `/commit` - Generate message from diff
+
+**Workflows with scripts:**
+- Error-prone operations in scripts
+- Skills just orchestrate script calls
+- Validation is deterministic
+
+**Structured outputs:**
+- Tasks with clear templates
+- Format is defined upfront
+- No ambiguous formatting
+
+**Reference/knowledge skills:**
+- `gitea` - CLI reference
+- `issue-writing` - Patterns and templates
+- `software-architecture` - Best practices
+
+### Examples of Haiku Success
+
+**work-issue skill:**
+- Sequential steps (view → branch → plan → implement → PR)
+- Each step has clear validation
+- Scripts handle error-prone operations
+- Success rate: ~90%
+
+**dashboard skill:**
+- Fetch data (tea commands)
+- Format as table
+- Clear, structured output
+- Success rate: ~95%
+
+## When to Use Sonnet
+
+### Use Sonnet When
+
+**Haiku fails 20%+ of the time**
+- Test with Haiku first
+- If success rate <80%, upgrade to Sonnet
+
+**Complex judgment required:**
+- Code review (quality assessment)
+- Issue grooming (clarity evaluation)
+- Architecture decisions
+
+**Nuanced reasoning:**
+- Understanding implicit requirements
+- Making trade-off decisions
+- Applying context-dependent rules
+
+### Examples of Sonnet Success
+
+**review-pr skill:**
+- Requires code understanding
+- Judgment about quality/bugs
+- Context-dependent feedback
+- Originally tried Haiku: 65% success → Sonnet: 85%
+
+**issue-worker agent:**
+- Autonomous implementation
+- Pattern matching
+- Architectural decisions
+- Originally tried Haiku: 70% success → Sonnet: 82%
+
+## When to Use Opus
+
+### Reserve Opus For
+
+**Deep architectural reasoning:**
+- `software-architect` agent
+- Pattern recognition across large codebases
+- Identifying subtle anti-patterns
+- Trade-off analysis
+
+**High-stakes decisions:**
+- Breaking changes analysis
+- System-wide refactoring plans
+- Security architecture review
+
+**Complex pattern recognition:**
+- Requires sophisticated understanding
+- Multiple layers of abstraction
+- Long-term implications
+
+### Examples of Opus Success
+
+**software-architect agent:**
+- Analyzes entire codebase
+- Identifies 8 different anti-patterns
+- Provides prioritized recommendations
+- Sonnet: 68% success → Opus: 88%
+
+**arch-review-repo skill:**
+- Comprehensive architecture audit
+- Cross-cutting concerns
+- System-wide patterns
+- Opus justified for depth
+
+## Making Haiku More Effective
+
+If Haiku is struggling, try these improvements **before** upgrading to Sonnet:
+
+### 1. Add Validation Steps
+
+**Instead of:**
+```markdown
+3. Implement changes and create PR
+```
+
+**Try:**
+```markdown
+3. Implement changes
+4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
+5. Create PR: `./scripts/create-pr.sh`
+```
+
+### 2. Bundle Error-Prone Operations in Scripts
+
+**Instead of:**
+```markdown
+5. Create PR: `tea pulls create --title "..." --description "..."`
+```
+
+**Try:**
+```markdown
+5. Create PR: `./scripts/create-pr.sh $issue "$title"`
+```
+
+### 3. Add Structured Output Templates
+
+**Instead of:**
+```markdown
+Show the results
+```
+
+**Try:**
+```markdown
+Format results as:
+
+| Issue | Status | Link |
+|-------|--------|------|
+| ... | ... | ... |
+```
+
+### 4. Add Explicit Checklists
+
+**Instead of:**
+```markdown
+Review the code for quality
+```
+
+**Try:**
+```markdown
+Check:
+- [ ] Code quality (readability, naming)
+- [ ] Bugs (edge cases, null checks)
+- [ ] Tests (coverage, assertions)
+```
+
+### 5. Make Instructions More Concise
+
+**Instead of:**
+```markdown
+Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...
+```
+
+**Try:**
+```markdown
+`git commit -m 'feat: add feature'`
+```
+
+## Testing Methodology
+
+### Create Test Suite
+
+For each skill, create 3-5 test cases:
+
+**Example: work-issue skill tests**
+1. Simple bug fix issue
+2. New feature with acceptance criteria
+3. Issue missing acceptance criteria
+4. Issue with tests that fail
+5. Complex refactoring task
+
+### Test with Haiku
+
+```bash
+# Set skill to Haiku
+model: haiku
+
+# Run all 5 tests
+# Document success/failure for each
+```
+
+### Measure Success Rate
+
+```
+Success rate = (Successful tests / Total tests) × 100
+```
+
+**Decision:**
+- ≥80% → Keep Haiku
+- <80% → Try Sonnet
+- <50% → Likely need Opus or redesign
+
+### Test with Sonnet (if needed)
+
+```bash
+# Upgrade to Sonnet
+model: sonnet
+
+# Run same 5 tests
+# Compare results
+```
+
+### Document Decision
+
+```yaml
+---
+name: work-issue
+model: haiku  # Tested: 4/5 tests passed with Haiku (80%)
+---
+```
+
+Or:
+
+```yaml
+---
+name: review-pr
+model: sonnet  # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
+---
+```
+
+## Common Patterns
+
+### Pattern: Start Haiku, Upgrade if Needed
+
+**Issue-worker agent evolution:**
+1. **V1 (Haiku):** 70% success - struggled with pattern matching
+2. **Analysis:** Added more examples, still 72%
+3. **V2 (Sonnet):** 82% success - better code understanding
+4. **Decision:** Keep Sonnet, document why
+
+### Pattern: Haiku for Most, Sonnet for Complex
+
+**Review-pr skill:**
+- Static analysis steps: Haiku could handle
+- Manual code review: Needs Sonnet judgment
+- **Decision:** Use Sonnet for whole skill (simplicity)
+
+### Pattern: Split Complex Skills
+
+**Instead of:** One complex skill using Opus
+
+**Try:** Split into:
+- Haiku skill for orchestration
+- Sonnet agent for complex subtask
+- Saves cost (most work in Haiku)
+
+## Model Selection Checklist
+
+Before choosing a model:
+
+- [ ] Tested with Haiku first
+- [ ] Measured success rate on 3-5 test cases
+- [ ] Tried improvements (scripts, validation, checklists)
+- [ ] Documented why this model is needed
+- [ ] Considered cost implications (12x/60x)
+- [ ] Considered speed implications (2.5x/5x slower)
+- [ ] Will re-test if Claude models improve
+
+## Future-Proofing
+
+**Models improve over time.**
+
+Periodically re-test Sonnet/Opus skills with Haiku:
+- Haiku v2 might handle what Haiku v1 couldn't
+- Cost savings compound over time
+- Speed improvements are valuable
+
+**Set a reminder:** Test Haiku again in 3-6 months.