architecture/skills/capability-writing/best-practices.md

# Skill Authoring Best Practices

Based on Anthropic's latest agent skills documentation (January 2025).

## Core Principles

### Concise is Key

> "The context window is a public good. Default assumption: Claude is already very smart."

**Only add context Claude doesn't already have.**

**Challenge each piece of information:**
- "Does Claude really need this explanation?"
- "Can I assume Claude knows this?"
- "Does this paragraph justify its token cost?"

**Good example (concise):**
```markdown
## Extract PDF text

Use pdfplumber:

\`\`\`python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
    text = pdf.pages[0].extract_text()
\`\`\`
```

**Bad example (verbose):**
```markdown
## Extract PDF text

PDF (Portable Document Format) files are a common file format that contains text,
images, and other content. To extract text from a PDF, you'll need to use a library.
There are many libraries available for PDF processing, but we recommend pdfplumber
because it's easy to use and handles most cases well. First, you'll need to install
it using pip. Then you can use the code below...
```

The concise version assumes Claude knows what PDFs are and how libraries work.

### Set Appropriate Degrees of Freedom

Match the level of specificity to the task's fragility and variability.

#### High Freedom (Text-Based Instructions)

Use when multiple approaches are valid:

```markdown
## Code Review Process

1. Analyze code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability
4. Verify adherence to project conventions
```

#### Medium Freedom (Templates/Pseudocode)

Use when there's a preferred pattern but variation is acceptable:

```markdown
## Generate Report

Use this template and customize as needed:

\`\`\`python
def generate_report(data, format="markdown", include_charts=True):
    # Process data
    # Generate output in specified format
    # Optionally include visualizations
\`\`\`
```

#### Low Freedom (Exact Scripts)

Use when operations are fragile and error-prone:

```markdown
## Database Migration

Run exactly this script:

\`\`\`bash
python scripts/migrate.py --verify --backup
\`\`\`

Do not modify the command or add additional flags.
```

**Analogy:** Think of Claude as a robot exploring a path:
- **Narrow bridge with cliffs**: One safe way forward. Provide specific guardrails (low freedom)
- **Open field**: Many paths lead to success. Give general direction (high freedom)

### Progressive Disclosure

Split large skills into layers that load on-demand.

#### Three Levels of Loading

| Level | When Loaded | Token Cost | Content |
|-------|------------|------------|---------|
| **Level 1: Metadata** | Always (at startup) | ~100 tokens | `name` and `description` from frontmatter |
| **Level 2: Instructions** | When skill is triggered | Under 5k tokens | SKILL.md body with instructions |
| **Level 3: Resources** | As needed | Unlimited | Referenced files, scripts |

#### Organizing Large Skills

**Pattern 1: High-level guide with references**

```markdown
# PDF Processing

## Quick Start
\`\`\`python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
    text = pdf.pages[0].extract_text()
\`\`\`

## Advanced Features
**Form filling**: See [FORMS.md](FORMS.md)
**API reference**: See [REFERENCE.md](REFERENCE.md)
**Examples**: See [EXAMPLES.md](EXAMPLES.md)
```

Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.

**Pattern 2: Domain-specific organization**

For skills with multiple domains:

```
bigquery-skill/
├── SKILL.md (overview and navigation)
└── reference/
    ├── finance.md (revenue, billing metrics)
    ├── sales.md (opportunities, pipeline)
    ├── product.md (API usage, features)
    └── marketing.md (campaigns, attribution)
```

When user asks about revenue, Claude reads only `reference/finance.md`.

**Pattern 3: Conditional details**

```markdown
# DOCX Processing

## Creating Documents
Use docx-js. See [DOCX-JS.md](DOCX-JS.md).

## Editing Documents
For simple edits, modify XML directly.

**For tracked changes**: See [REDLINING.md](REDLINING.md)
**For OOXML details**: See [OOXML.md](OOXML.md)
```

#### Avoid Deeply Nested References

**Keep references one level deep from SKILL.md.**

**Bad (too deep):**
```
SKILL.md → advanced.md → details.md → actual info
```

**Good (one level):**
```
SKILL.md → {advanced.md, reference.md, examples.md}
```

#### Structure Longer Files with TOC

For reference files >100 lines, include a table of contents:

```markdown
# API Reference

## Contents
- Authentication and setup
- Core methods (create, read, update, delete)
- Advanced features (batch operations, webhooks)
- Error handling patterns
- Code examples

## Authentication and Setup
...
```

This ensures Claude can see the full scope even with partial reads.

## Script Bundling

### When to Bundle Scripts

Bundle scripts for:
- **Error-prone operations**: Complex bash with retry logic
- **Fragile sequences**: Operations requiring exact order
- **Validation steps**: Checking conditions before proceeding
- **Reusable utilities**: Operations used in multiple steps

**Benefits of bundled scripts:**
- More reliable than generated code
- Save tokens (no code in context)
- Save time (no code generation)
- Ensure consistency

### Script Structure

```bash
#!/bin/bash
# script-name.sh - Brief description
#
# Usage: script-name.sh <param1> <param2>
#
# Example: script-name.sh issue-42 "Fix bug"

set -e  # Exit on error

# Input validation
if [ $# -lt 2 ]; then
  echo "Usage: $0 <param1> <param2>"
  exit 1
fi

param1=$1
param2=$2

# Main logic with error handling
if ! some_command; then
  echo "ERROR: Command failed"
  exit 1
fi

# Success output
echo "SUCCESS: Operation completed"
```

### Referencing Scripts in Skills

**Make clear whether to execute or read:**

**Execute (most common):**
```markdown
7. **Create PR**: `./scripts/create-pr.sh $1 "$title"`
```

**Read as reference (for understanding complex logic):**
```markdown
See `./scripts/analyze-form.py` for the field extraction algorithm
```

### Solving, Not Punting

Scripts should handle error conditions, not punt to Claude.

**Good (handles errors):**
```python
def process_file(path):
    try:
        with open(path) as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {path} not found, creating default")
        with open(path, 'w') as f:
            f.write('')
        return ''
    except PermissionError:
        print(f"Cannot access {path}, using default")
        return ''
```

**Bad (punts to Claude):**
```python
def process_file(path):
    return open(path).read()  # Fails, Claude has to figure it out
```

## Workflow Patterns

### Plan-Validate-Execute

Add verification checkpoints to catch errors early.

**Example: Workflow with validation**

```markdown
## PDF Form Filling

Copy this checklist:

\`\`\`
Progress:
- [ ] Step 1: Analyze form (run analyze_form.py)
- [ ] Step 2: Create field mapping (edit fields.json)
- [ ] Step 3: Validate mapping (run validate_fields.py)
- [ ] Step 4: Fill form (run fill_form.py)
- [ ] Step 5: Verify output (run verify_output.py)
\`\`\`

**Step 1: Analyze**
Run: `python scripts/analyze_form.py input.pdf`

**Step 2: Create Mapping**
Edit `fields.json`

**Step 3: Validate**
Run: `python scripts/validate_fields.py fields.json`
Fix any errors before continuing.

**Step 4: Fill**
Run: `python scripts/fill_form.py input.pdf fields.json output.pdf`

**Step 5: Verify**
Run: `python scripts/verify_output.py output.pdf`
If verification fails, return to Step 2.
```

### Feedback Loops

**Pattern:** Run validator → fix errors → repeat

**Example: Document editing**

```markdown
1. Make edits to `word/document.xml`
2. **Validate**: `python scripts/validate.py unpacked_dir/`
3. If validation fails:
   - Review error message
   - Fix issues
   - Run validation again
4. **Only proceed when validation passes**
5. Rebuild: `python scripts/pack.py unpacked_dir/ output.docx`
6. Test output document
```

## Model Selection

### Decision Framework

```
Start with Haiku
    |
    v
Test on 3-5 representative tasks
    |
    +-- Success rate ≥80%? ---------> Use Haiku ✓
    |
    +-- Success rate <80%? --------> Try Sonnet
                                          |
                                          v
                                    Test on same tasks
                                          |
                                          +-- Success ≥80%? --> Use Sonnet
                                          |
                                          +-- Still failing? --> Opus or redesign task
```

### Haiku Works Well When

- **Steps are simple and validated**
- **Instructions are concise** (no verbose explanations)
- **Error-prone operations use scripts** (deterministic)
- **Outputs have structured templates**
- **Checklists replace open-ended judgment**

### Testing with Multiple Models

Test skills with all models you plan to use:

1. **Create test cases:** 3-5 representative scenarios
2. **Run with Haiku:** Measure success rate, response quality
3. **Run with Sonnet:** Compare results
4. **Adjust instructions:** If Haiku struggles, add clarity or scripts

What works for Opus might need more detail for Haiku.

## Common Anti-Patterns

### Offering Too Many Options

**Bad (confusing):**
```markdown
You can use pypdf, or pdfplumber, or PyMuPDF, or pdf2image, or...
```

**Good (provide default):**
```markdown
Use pdfplumber for text extraction:
\`\`\`python
import pdfplumber
\`\`\`

For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
```

### Time-Sensitive Information

**Bad (will become wrong):**
```markdown
If you're doing this before August 2025, use the old API.
After August 2025, use the new API.
```

**Good (use "old patterns" section):**
```markdown
## Current Method
Use the v2 API: `api.example.com/v2/messages`

## Old Patterns
<details>
<summary>Legacy v1 API (deprecated 2025-08)</summary>

The v1 API used: `api.example.com/v1/messages`
This endpoint is no longer supported.
</details>
```

### Inconsistent Terminology

**Good (consistent):**
- Always "API endpoint"
- Always "field"
- Always "extract"

**Bad (inconsistent):**
- Mix "API endpoint", "URL", "API route", "path"
- Mix "field", "box", "element", "control"
- Mix "extract", "pull", "get", "retrieve"

### Windows-Style Paths

Always use forward slashes:

- ✓ **Good**: `scripts/helper.py`, `reference/guide.md`
- ✗ **Bad**: `scripts\helper.py`, `reference\guide.md`

Unix-style paths work cross-platform.

## Iterative Development

### Build Evaluations First

Create test cases BEFORE extensive documentation:

1. **Identify gaps**: Run Claude on tasks without skill, document failures
2. **Create evaluations**: Build 3-5 test scenarios
3. **Establish baseline**: Measure Claude's performance without skill
4. **Write minimal instructions**: Just enough to pass evaluations
5. **Iterate**: Execute evaluations, refine

### Develop Iteratively with Claude

**Use Claude to help write skills:**

1. **Complete a task without skill**: Work through problem, note what context you provide
2. **Identify reusable pattern**: What context is useful for similar tasks?
3. **Ask Claude to create skill**: "Create a skill that captures this pattern"
4. **Review for conciseness**: Remove unnecessary explanations
5. **Test on similar tasks**: Use skill with fresh Claude instance
6. **Iterate based on observation**: Where does Claude struggle?

Claude understands skill format natively - no special prompts needed.

## Checklist for Effective Skills

**Before publishing:**

### Core Quality
- [ ] Description is specific and includes key terms
- [ ] Description includes what skill does AND when to use it
- [ ] SKILL.md body under 500 lines
- [ ] Additional details in separate files (if needed)
- [ ] No time-sensitive information
- [ ] Consistent terminology throughout
- [ ] Examples are concrete, not abstract
- [ ] File references are one level deep
- [ ] Progressive disclosure used appropriately
- [ ] Workflows have clear steps

### Code and Scripts
- [ ] Scripts solve problems, don't punt to Claude
- [ ] Error handling is explicit and helpful
- [ ] No "magic numbers" (all values justified)
- [ ] Required packages listed and verified
- [ ] Scripts have clear documentation
- [ ] No Windows-style paths (all forward slashes)
- [ ] Validation steps for critical operations
- [ ] Feedback loops for quality-critical tasks

### Testing
- [ ] At least 3 test cases created
- [ ] Tested with Haiku (if that's the target)
- [ ] Tested with real usage scenarios
- [ ] Team feedback incorporated (if applicable)