chore: move agents and skills to old2 folder
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
336
old2/skills/reference/model-selection.md
Normal file
336
old2/skills/reference/model-selection.md
Normal file
@@ -0,0 +1,336 @@
|
||||
# Model Selection Guide
|
||||
|
||||
Detailed guidance on choosing the right model for skills and agents.
|
||||
|
||||
## Cost Comparison
|
||||
|
||||
| Model | Input (per MTok) | Output (per MTok) | vs Haiku |
|
||||
|-------|------------------|-------------------|----------|
|
||||
| **Haiku** | $0.25 | $1.25 | Baseline |
|
||||
| **Sonnet** | $3.00 | $15.00 | 12x more expensive |
|
||||
| **Opus** | $15.00 | $75.00 | 60x more expensive |
|
||||
|
||||
**Example cost for typical skill call (2K input, 1K output):**
|
||||
- Haiku: $0.00175
|
||||
- Sonnet: $0.021 (12x more)
|
||||
- Opus: $0.105 (60x more)
|
||||
|
||||
## Speed Comparison
|
||||
|
||||
| Model | Tokens/Second | vs Haiku |
|
||||
|-------|---------------|----------|
|
||||
| **Haiku** | ~100 | Baseline |
|
||||
| **Sonnet** | ~40 | 2.5x slower |
|
||||
| **Opus** | ~20 | 5x slower |
|
||||
|
||||
## Decision Framework
|
||||
|
||||
```
|
||||
Start with Haiku by default
|
||||
|
|
||||
v
|
||||
Test on 3-5 representative tasks
|
||||
|
|
||||
+-- Success rate ≥80%? ---------> ✓ Use Haiku
|
||||
| (12x cheaper, 2-5x faster)
|
||||
|
|
||||
+-- Success rate <80%? --------> Try Sonnet
|
||||
| |
|
||||
| v
|
||||
| Test on same tasks
|
||||
| |
|
||||
| +-- Success ≥80%? --> Use Sonnet
|
||||
| |
|
||||
| +-- Still failing? --> Opus or redesign
|
||||
|
|
||||
v
|
||||
Document why you chose the model
|
||||
```
|
||||
|
||||
## When Haiku Works Well
|
||||
|
||||
### ✓ Ideal for Haiku
|
||||
|
||||
**Simple sequential workflows:**
|
||||
- `/dashboard` - Fetch and display
|
||||
- `/roadmap` - List and format
|
||||
- `/commit` - Generate message from diff
|
||||
|
||||
**Workflows with scripts:**
|
||||
- Error-prone operations in scripts
|
||||
- Skills just orchestrate script calls
|
||||
- Validation is deterministic
|
||||
|
||||
**Structured outputs:**
|
||||
- Tasks with clear templates
|
||||
- Format is defined upfront
|
||||
- No ambiguous formatting
|
||||
|
||||
**Reference/knowledge skills:**
|
||||
- `gitea` - CLI reference
|
||||
- `issue-writing` - Patterns and templates
|
||||
- `software-architecture` - Best practices
|
||||
|
||||
### Examples of Haiku Success
|
||||
|
||||
**work-issue skill:**
|
||||
- Sequential steps (view → branch → plan → implement → PR)
|
||||
- Each step has clear validation
|
||||
- Scripts handle error-prone operations
|
||||
- Success rate: ~90%
|
||||
|
||||
**dashboard skill:**
|
||||
- Fetch data (tea commands)
|
||||
- Format as table
|
||||
- Clear, structured output
|
||||
- Success rate: ~95%
|
||||
|
||||
## When to Use Sonnet
|
||||
|
||||
### Use Sonnet When
|
||||
|
||||
**Haiku fails 20%+ of the time**
|
||||
- Test with Haiku first
|
||||
- If success rate <80%, upgrade to Sonnet
|
||||
|
||||
**Complex judgment required:**
|
||||
- Code review (quality assessment)
|
||||
- Issue grooming (clarity evaluation)
|
||||
- Architecture decisions
|
||||
|
||||
**Nuanced reasoning:**
|
||||
- Understanding implicit requirements
|
||||
- Making trade-off decisions
|
||||
- Applying context-dependent rules
|
||||
|
||||
### Examples of Sonnet Success
|
||||
|
||||
**review-pr skill:**
|
||||
- Requires code understanding
|
||||
- Judgment about quality/bugs
|
||||
- Context-dependent feedback
|
||||
- Originally tried Haiku: 65% success → Sonnet: 85%
|
||||
|
||||
**issue-worker agent:**
|
||||
- Autonomous implementation
|
||||
- Pattern matching
|
||||
- Architectural decisions
|
||||
- Originally tried Haiku: 70% success → Sonnet: 82%
|
||||
|
||||
## When to Use Opus
|
||||
|
||||
### Reserve Opus For
|
||||
|
||||
**Deep architectural reasoning:**
|
||||
- `software-architect` agent
|
||||
- Pattern recognition across large codebases
|
||||
- Identifying subtle anti-patterns
|
||||
- Trade-off analysis
|
||||
|
||||
**High-stakes decisions:**
|
||||
- Breaking changes analysis
|
||||
- System-wide refactoring plans
|
||||
- Security architecture review
|
||||
|
||||
**Complex pattern recognition:**
|
||||
- Requires sophisticated understanding
|
||||
- Multiple layers of abstraction
|
||||
- Long-term implications
|
||||
|
||||
### Examples of Opus Success
|
||||
|
||||
**software-architect agent:**
|
||||
- Analyzes entire codebase
|
||||
- Identifies 8 different anti-patterns
|
||||
- Provides prioritized recommendations
|
||||
- Sonnet: 68% success → Opus: 88%
|
||||
|
||||
**arch-review-repo skill:**
|
||||
- Comprehensive architecture audit
|
||||
- Cross-cutting concerns
|
||||
- System-wide patterns
|
||||
- Opus justified for depth
|
||||
|
||||
## Making Haiku More Effective
|
||||
|
||||
If Haiku is struggling, try these improvements **before** upgrading to Sonnet:
|
||||
|
||||
### 1. Add Validation Steps
|
||||
|
||||
**Instead of:**
|
||||
```markdown
|
||||
3. Implement changes and create PR
|
||||
```
|
||||
|
||||
**Try:**
|
||||
```markdown
|
||||
3. Implement changes
|
||||
4. Validate: Run `./scripts/validate.sh` (tests pass, linter clean)
|
||||
5. Create PR: `./scripts/create-pr.sh`
|
||||
```
|
||||
|
||||
### 2. Bundle Error-Prone Operations in Scripts
|
||||
|
||||
**Instead of:**
|
||||
```markdown
|
||||
5. Create PR: `tea pulls create --title "..." --description "..."`
|
||||
```
|
||||
|
||||
**Try:**
|
||||
```markdown
|
||||
5. Create PR: `./scripts/create-pr.sh $issue "$title"`
|
||||
```
|
||||
|
||||
### 3. Add Structured Output Templates
|
||||
|
||||
**Instead of:**
|
||||
```markdown
|
||||
Show the results
|
||||
```
|
||||
|
||||
**Try:**
|
||||
```markdown
|
||||
Format results as:
|
||||
|
||||
| Issue | Status | Link |
|
||||
|-------|--------|------|
|
||||
| ... | ... | ... |
|
||||
```
|
||||
|
||||
### 4. Add Explicit Checklists
|
||||
|
||||
**Instead of:**
|
||||
```markdown
|
||||
Review the code for quality
|
||||
```
|
||||
|
||||
**Try:**
|
||||
```markdown
|
||||
Check:
|
||||
- [ ] Code quality (readability, naming)
|
||||
- [ ] Bugs (edge cases, null checks)
|
||||
- [ ] Tests (coverage, assertions)
|
||||
```
|
||||
|
||||
### 5. Make Instructions More Concise
|
||||
|
||||
**Instead of:**
|
||||
```markdown
|
||||
Git is a version control system. When you want to commit changes, you use the git commit command which saves your changes to the repository...
|
||||
```
|
||||
|
||||
**Try:**
|
||||
```markdown
|
||||
`git commit -m 'feat: add feature'`
|
||||
```
|
||||
|
||||
## Testing Methodology
|
||||
|
||||
### Create Test Suite
|
||||
|
||||
For each skill, create 3-5 test cases:
|
||||
|
||||
**Example: work-issue skill tests**
|
||||
1. Simple bug fix issue
|
||||
2. New feature with acceptance criteria
|
||||
3. Issue missing acceptance criteria
|
||||
4. Issue with tests that fail
|
||||
5. Complex refactoring task
|
||||
|
||||
### Test with Haiku
|
||||
|
||||
```bash
|
||||
# Set skill to Haiku
|
||||
model: haiku
|
||||
|
||||
# Run all 5 tests
|
||||
# Document success/failure for each
|
||||
```
|
||||
|
||||
### Measure Success Rate
|
||||
|
||||
```
|
||||
Success rate = (Successful tests / Total tests) × 100
|
||||
```
|
||||
|
||||
**Decision:**
|
||||
- ≥80% → Keep Haiku
|
||||
- <80% → Try Sonnet
|
||||
- <50% → Likely need Opus or redesign
|
||||
|
||||
### Test with Sonnet (if needed)
|
||||
|
||||
```bash
|
||||
# Upgrade to Sonnet
|
||||
model: sonnet
|
||||
|
||||
# Run same 5 tests
|
||||
# Compare results
|
||||
```
|
||||
|
||||
### Document Decision
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: work-issue
|
||||
model: haiku # Tested: 4/5 tests passed with Haiku (80%)
|
||||
---
|
||||
```
|
||||
|
||||
Or:
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: review-pr
|
||||
model: sonnet # Tested: Haiku 3/5 (60%), Sonnet 4/5 (80%)
|
||||
---
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern: Start Haiku, Upgrade if Needed
|
||||
|
||||
**Issue-worker agent evolution:**
|
||||
1. **V1 (Haiku):** 70% success - struggled with pattern matching
|
||||
2. **Analysis:** Added more examples, still 72%
|
||||
3. **V2 (Sonnet):** 82% success - better code understanding
|
||||
4. **Decision:** Keep Sonnet, document why
|
||||
|
||||
### Pattern: Haiku for Most, Sonnet for Complex
|
||||
|
||||
**Review-pr skill:**
|
||||
- Static analysis steps: Haiku could handle
|
||||
- Manual code review: Needs Sonnet judgment
|
||||
- **Decision:** Use Sonnet for whole skill (simplicity)
|
||||
|
||||
### Pattern: Split Complex Skills
|
||||
|
||||
**Instead of:** One complex skill using Opus
|
||||
|
||||
**Try:** Split into:
|
||||
- Haiku skill for orchestration
|
||||
- Sonnet agent for complex subtask
|
||||
- Saves cost (most work in Haiku)
|
||||
|
||||
## Model Selection Checklist
|
||||
|
||||
Before choosing a model:
|
||||
|
||||
- [ ] Tested with Haiku first
|
||||
- [ ] Measured success rate on 3-5 test cases
|
||||
- [ ] Tried improvements (scripts, validation, checklists)
|
||||
- [ ] Documented why this model is needed
|
||||
- [ ] Considered cost implications (12x/60x)
|
||||
- [ ] Considered speed implications (2.5x/5x slower)
|
||||
- [ ] Will re-test if Claude models improve
|
||||
|
||||
## Future-Proofing
|
||||
|
||||
**Models improve over time.**
|
||||
|
||||
Periodically re-test Sonnet/Opus skills with Haiku:
|
||||
- Haiku v2 might handle what Haiku v1 couldn't
|
||||
- Cost savings compound over time
|
||||
- Speed improvements are valuable
|
||||
|
||||
**Set a reminder:** Test Haiku again in 3-6 months.
|
||||
Reference in New Issue
Block a user