Files

feat(skills): modernize capability-writing with Anthropic best practices

Updates capability-writing skill with progressive disclosure structure based on
Anthropic's January 2025 documentation. Implements Haiku-first approach (12x
cheaper, 2-5x faster than Sonnet).

Key changes:
- Add 5 core principles: conciseness, progressive disclosure, script bundling,
  degrees of freedom, and Haiku-first model selection
- Restructure with best-practices.md, templates/, examples/, and reference/
- Create 4 templates: user-invocable skill, background skill, agent, helper script
- Add 3 examples: simple workflow, progressive disclosure, with scripts
- Add 3 reference docs: frontmatter fields, model selection, anti-patterns
- Update create-capability to analyze complexity and recommend structures
- Default all new skills/agents to Haiku unless justified

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-11 18:10:53 +01:00

12 KiB

Raw Blame History

Skill Authoring Best Practices

Based on Anthropic's latest agent skills documentation (January 2025).

Core Principles

Concise is Key

"The context window is a public good. Default assumption: Claude is already very smart."

Only add context Claude doesn't already have.

Challenge each piece of information:

"Does Claude really need this explanation?"
"Can I assume Claude knows this?"
"Does this paragraph justify its token cost?"

Good example (concise):

## Extract PDF text

Use pdfplumber:

\`\`\`python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
    text = pdf.pages[0].extract_text()
\`\`\`

Bad example (verbose):

## Extract PDF text

PDF (Portable Document Format) files are a common file format that contains text,
images, and other content. To extract text from a PDF, you'll need to use a library.
There are many libraries available for PDF processing, but we recommend pdfplumber
because it's easy to use and handles most cases well. First, you'll need to install
it using pip. Then you can use the code below...

The concise version assumes Claude knows what PDFs are and how libraries work.

Set Appropriate Degrees of Freedom

Match the level of specificity to the task's fragility and variability.

High Freedom (Text-Based Instructions)

Use when multiple approaches are valid:

## Code Review Process

1. Analyze code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability
4. Verify adherence to project conventions

Medium Freedom (Templates/Pseudocode)

Use when there's a preferred pattern but variation is acceptable:

## Generate Report

Use this template and customize as needed:

\`\`\`python
def generate_report(data, format="markdown", include_charts=True):
    # Process data
    # Generate output in specified format
    # Optionally include visualizations
\`\`\`

Low Freedom (Exact Scripts)

Use when operations are fragile and error-prone:

## Database Migration

Run exactly this script:

\`\`\`bash
python scripts/migrate.py --verify --backup
\`\`\`

Do not modify the command or add additional flags.

Analogy: Think of Claude as a robot exploring a path:

Narrow bridge with cliffs: One safe way forward. Provide specific guardrails (low freedom)
Open field: Many paths lead to success. Give general direction (high freedom)

Progressive Disclosure

Split large skills into layers that load on-demand.

Three Levels of Loading

Level	When Loaded	Token Cost	Content
Level 1: Metadata	Always (at startup)	~100 tokens	`name` and `description` from frontmatter
Level 2: Instructions	When skill is triggered	Under 5k tokens	SKILL.md body with instructions
Level 3: Resources	As needed	Unlimited	Referenced files, scripts

Organizing Large Skills

Pattern 1: High-level guide with references

# PDF Processing

## Quick Start
\`\`\`python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
    text = pdf.pages[0].extract_text()
\`\`\`

## Advanced Features
**Form filling**: See [FORMS.md](FORMS.md)
**API reference**: See [REFERENCE.md](REFERENCE.md)
**Examples**: See [EXAMPLES.md](EXAMPLES.md)

Claude loads FORMS.md, REFERENCE.md, or EXAMPLES.md only when needed.

Pattern 2: Domain-specific organization

For skills with multiple domains:

bigquery-skill/
├── SKILL.md (overview and navigation)
└── reference/
    ├── finance.md (revenue, billing metrics)
    ├── sales.md (opportunities, pipeline)
    ├── product.md (API usage, features)
    └── marketing.md (campaigns, attribution)

When user asks about revenue, Claude reads only reference/finance.md.

Pattern 3: Conditional details

# DOCX Processing

## Creating Documents
Use docx-js. See [DOCX-JS.md](DOCX-JS.md).

## Editing Documents
For simple edits, modify XML directly.

**For tracked changes**: See [REDLINING.md](REDLINING.md)
**For OOXML details**: See [OOXML.md](OOXML.md)

Avoid Deeply Nested References

Keep references one level deep from SKILL.md.

Bad (too deep):

SKILL.md → advanced.md → details.md → actual info

Good (one level):

SKILL.md → {advanced.md, reference.md, examples.md}

Structure Longer Files with TOC

For reference files >100 lines, include a table of contents:

# API Reference

## Contents
- Authentication and setup
- Core methods (create, read, update, delete)
- Advanced features (batch operations, webhooks)
- Error handling patterns
- Code examples

## Authentication and Setup
...

This ensures Claude can see the full scope even with partial reads.

Script Bundling

When to Bundle Scripts

Bundle scripts for:

Error-prone operations: Complex bash with retry logic
Fragile sequences: Operations requiring exact order
Validation steps: Checking conditions before proceeding
Reusable utilities: Operations used in multiple steps

Benefits of bundled scripts:

More reliable than generated code
Save tokens (no code in context)
Save time (no code generation)
Ensure consistency

Script Structure

#!/bin/bash
# script-name.sh - Brief description
#
# Usage: script-name.sh <param1> <param2>
#
# Example: script-name.sh issue-42 "Fix bug"

set -e  # Exit on error

# Input validation
if [ $# -lt 2 ]; then
  echo "Usage: $0 <param1> <param2>"
  exit 1
fi

param1=$1
param2=$2

# Main logic with error handling
if ! some_command; then
  echo "ERROR: Command failed"
  exit 1
fi

# Success output
echo "SUCCESS: Operation completed"

Referencing Scripts in Skills

Make clear whether to execute or read:

Execute (most common):

7. **Create PR**: `./scripts/create-pr.sh $1 "$title"`

Read as reference (for understanding complex logic):

See `./scripts/analyze-form.py` for the field extraction algorithm

Solving, Not Punting

Scripts should handle error conditions, not punt to Claude.

Good (handles errors):

def process_file(path):
    try:
        with open(path) as f:
            return f.read()
    except FileNotFoundError:
        print(f"File {path} not found, creating default")
        with open(path, 'w') as f:
            f.write('')
        return ''
    except PermissionError:
        print(f"Cannot access {path}, using default")
        return ''

Bad (punts to Claude):

def process_file(path):
    return open(path).read()  # Fails, Claude has to figure it out

Workflow Patterns

Plan-Validate-Execute

Add verification checkpoints to catch errors early.

Example: Workflow with validation

## PDF Form Filling

Copy this checklist:

\`\`\`
Progress:
- [ ] Step 1: Analyze form (run analyze_form.py)
- [ ] Step 2: Create field mapping (edit fields.json)
- [ ] Step 3: Validate mapping (run validate_fields.py)
- [ ] Step 4: Fill form (run fill_form.py)
- [ ] Step 5: Verify output (run verify_output.py)
\`\`\`

**Step 1: Analyze**
Run: `python scripts/analyze_form.py input.pdf`

**Step 2: Create Mapping**
Edit `fields.json`

**Step 3: Validate**
Run: `python scripts/validate_fields.py fields.json`
Fix any errors before continuing.

**Step 4: Fill**
Run: `python scripts/fill_form.py input.pdf fields.json output.pdf`

**Step 5: Verify**
Run: `python scripts/verify_output.py output.pdf`
If verification fails, return to Step 2.

Feedback Loops

Pattern: Run validator → fix errors → repeat

Example: Document editing

1. Make edits to `word/document.xml`
2. **Validate**: `python scripts/validate.py unpacked_dir/`
3. If validation fails:
   - Review error message
   - Fix issues
   - Run validation again
4. **Only proceed when validation passes**
5. Rebuild: `python scripts/pack.py unpacked_dir/ output.docx`
6. Test output document

Model Selection

Decision Framework

Start with Haiku
    |
    v
Test on 3-5 representative tasks
    |
    +-- Success rate ≥80%? ---------> Use Haiku ✓
    |
    +-- Success rate <80%? --------> Try Sonnet
                                          |
                                          v
                                    Test on same tasks
                                          |
                                          +-- Success ≥80%? --> Use Sonnet
                                          |
                                          +-- Still failing? --> Opus or redesign task

Haiku Works Well When

Steps are simple and validated
Instructions are concise (no verbose explanations)
Error-prone operations use scripts (deterministic)
Outputs have structured templates
Checklists replace open-ended judgment

Testing with Multiple Models

Test skills with all models you plan to use:

Create test cases: 3-5 representative scenarios
Run with Haiku: Measure success rate, response quality
Run with Sonnet: Compare results
Adjust instructions: If Haiku struggles, add clarity or scripts

What works for Opus might need more detail for Haiku.

Common Anti-Patterns

Offering Too Many Options

Bad (confusing):

You can use pypdf, or pdfplumber, or PyMuPDF, or pdf2image, or...

Good (provide default):

Use pdfplumber for text extraction:
\`\`\`python
import pdfplumber
\`\`\`

For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.

Time-Sensitive Information

Bad (will become wrong):

If you're doing this before August 2025, use the old API.
After August 2025, use the new API.

Good (use "old patterns" section):

## Current Method
Use the v2 API: `api.example.com/v2/messages`

## Old Patterns
<details>
<summary>Legacy v1 API (deprecated 2025-08)</summary>

The v1 API used: `api.example.com/v1/messages`
This endpoint is no longer supported.
</details>

Inconsistent Terminology

Good (consistent):

Always "API endpoint"
Always "field"
Always "extract"

Bad (inconsistent):

Mix "API endpoint", "URL", "API route", "path"
Mix "field", "box", "element", "control"
Mix "extract", "pull", "get", "retrieve"

Windows-Style Paths

Always use forward slashes:

✓ Good: scripts/helper.py, reference/guide.md
✗ Bad: scripts\helper.py, reference\guide.md

Unix-style paths work cross-platform.

Iterative Development

Build Evaluations First

Create test cases BEFORE extensive documentation:

Identify gaps: Run Claude on tasks without skill, document failures
Create evaluations: Build 3-5 test scenarios
Establish baseline: Measure Claude's performance without skill
Write minimal instructions: Just enough to pass evaluations
Iterate: Execute evaluations, refine

Develop Iteratively with Claude

Use Claude to help write skills:

Complete a task without skill: Work through problem, note what context you provide
Identify reusable pattern: What context is useful for similar tasks?
Ask Claude to create skill: "Create a skill that captures this pattern"
Review for conciseness: Remove unnecessary explanations
Test on similar tasks: Use skill with fresh Claude instance
Iterate based on observation: Where does Claude struggle?

Claude understands skill format natively - no special prompts needed.

Checklist for Effective Skills

Before publishing:

Core Quality

Description is specific and includes key terms
Description includes what skill does AND when to use it
SKILL.md body under 500 lines
Additional details in separate files (if needed)
No time-sensitive information
Consistent terminology throughout
Examples are concrete, not abstract
File references are one level deep
Progressive disclosure used appropriately
Workflows have clear steps

Code and Scripts

Scripts solve problems, don't punt to Claude
Error handling is explicit and helpful
No "magic numbers" (all values justified)
Required packages listed and verified
Scripts have clear documentation
No Windows-style paths (all forward slashes)
Validation steps for critical operations
Feedback loops for quality-critical tasks

Testing

At least 3 test cases created
Tested with Haiku (if that's the target)
Tested with real usage scenarios
Team feedback incorporated (if applicable)

12 KiB Raw Blame History