Files
aether/.product-strategy/BACKLOG_INDEX.md
Hugo Nijhuis 271f5db444
Some checks failed
CI / build (push) Successful in 21s
CI / integration (push) Failing after 2m1s
Move product strategy documentation to .product-strategy directory
Organize all product strategy and domain modeling documentation into a
dedicated .product-strategy directory for better separation from code.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-12 23:57:20 +01:00

15 KiB

Aether Executable Backlog: Index & Navigation

Date Generated: 2026-01-12 Total Issues: 67 Total Capabilities: 9 Total Bounded Contexts: 5 Total Phases: 4


Quick Start

For busy decision-makers:

  1. Read: BACKLOG_QUICK_REFERENCE.md (5 min)
  2. See: Critical path shows 13 P0 issues for MVP
  3. Plan: 4 phases, ~10 weeks for full scope, ~6 weeks for critical path

For engineers:

  1. Read: BACKLOG.md (comprehensive, 2600+ lines)
  2. Pick: Phase 1 issues (foundation, no dependencies)
  3. Check: Issue details for acceptance criteria, test cases, DDD guidance

For architects:

  1. Review: CAPABILITIES.md - Product capabilities mapped to domain
  2. Read: BOUNDED_CONTEXT_MAP.md - Context boundaries and isolation
  3. Study: DOMAIN_MODEL_*.md - Domain concepts and invariants

Document Map

Backlog Documents (Start Here)

Document Purpose Audience Length
BACKLOG.md Complete executable backlog with all 67 issues Engineers, PMs 2600 lines
BACKLOG_QUICK_REFERENCE.md Tables, dependency graph, metrics Quick lookup 300 lines
BACKLOG_INDEX.md This file - navigation guide Everyone 400 lines

Domain & Strategy (Background)

Document Purpose Audience When to Read
CAPABILITIES.md 9 capabilities mapped to domain, value, success conditions Architects, PMs Before implementing
BOUNDED_CONTEXT_MAP.md 5 contexts: isolation rules, language, lifecycle Architects, Seniors During design review
STRATEGY_CHAIN.md Manifesto → Vision → Problem Space → Domains → Capabilities Decision-makers To understand "why"
PROBLEM_MAP.md Event storming: user journeys, decisions, events Product, Architects Before Phase 1

Domain Models (Technical Reference)

Document Purpose Scope
DOMAIN_MODEL_SUMMARY.md 1-page overview of all domain models All 5 contexts
DOMAIN_MODEL_EVENT_SOURCING.md Event Sourcing context (aggregates, commands, events, invariants) Deep dive: Context 1
DOMAIN_MODEL_OCC.md Optimistic Concurrency context Deep dive: Context 2
DOMAIN_MODEL_NAMESPACE_ISOLATION.md Namespace Isolation context Deep dive: Context 4
BOUNDED_CONTEXT_MAP.md Event Bus + Cluster coordination contexts (Contexts 3, 5) Integrated view

Cluster Documentation

Document Purpose When to Read
cluster/DOMAIN_MODEL.md Cluster coordination domain model (aggregates, commands, events) Phase 3
cluster/ARCHITECTURE.md Cluster architecture (leader election, shards, failure recovery) Phase 3 planning
cluster/PATTERNS.md Distributed patterns used in cluster coordination Phase 3 implementation

How to Use This Backlog

Scenario 1: "I need to build this. Where do I start?"

  1. Read BACKLOG_QUICK_REFERENCE.md (5 min)
  2. Focus on Phase 1 (17 issues, foundation)
  3. Start with Issue 1.1 (SaveEvent)
  4. Dependencies show what unblocks what

Go to: BACKLOG.md for full details


Scenario 2: "I need to understand the domain before coding"

  1. Read CAPABILITIES.md (product value perspective)
  2. Read PROBLEM_MAP.md (user journeys and events)
  3. Read DOMAIN_MODEL_SUMMARY.md (1-page overview)
  4. Deep-dive into specific context models (DOMAIN_MODEL_*.md)

Go to: DOMAIN_MODEL_EVENT_SOURCING.md for Phase 1 focus


Scenario 3: "I'm implementing Phase 1. What do I need to know?"

Phase 1 covers: Event storage, replay, snapshots, OCC, retry patterns

  1. Start: Issue 1.1 (SaveEvent with version validation)

    • Acceptance criteria tell you exactly what to build
    • DDD guidance explains the invariant (monotonic versions)
    • Test cases show edge cases
  2. Then: Issues 1.2-1.5 (append-only, events, queries)

    • These depend on 1.1; implement in parallel where possible
  3. Learn: Read DOMAIN_MODEL_EVENT_SOURCING.md

    • Understand aggregates, commands, events, invariants
    • See how SaveEvent fits into the larger picture
  4. Check: BACKLOG.md, Issue 1.1, acceptance criteria

    • Concrete, testable, specific requirements

Go to: BACKLOG.md Phase 1 section (line 48-300)


Scenario 4: "I'm planning Phase 3 (Clustering). Help me understand the domain."

Phase 3 covers: Node discovery, leader election, shard distribution, failure recovery

  1. Background: Read cluster/DOMAIN_MODEL.md

    • Aggregates: Cluster, LeadershipLease, ShardAssignment
    • Commands: JoinCluster, ElectLeader, RebalanceShards
    • Events: LeaderElected, NodeFailed, ShardMigrated
    • Invariants: single leader, no orphaned shards
  2. Architecture: Read cluster/ARCHITECTURE.md

    • How leader election works (lease-based, NATS heartbeats)
    • How consistent hashing minimizes reshuffling
    • How failures trigger rebalancing
  3. Patterns: Read cluster/PATTERNS.md

    • Distributed consensus patterns
    • Health check patterns
    • Migration patterns
  4. Issues: See BACKLOG.md Phase 3 (issues 3.1-3.17)

    • Decomposed into: topology, leadership, shards, failure recovery
    • Dependency order: discovery → election → assignment → health → rebalancing

Go to: BACKLOG.md Phase 3 section (line 800-1200)


Scenario 5: "I need to present this to stakeholders. What's the pitch?"

Key messages:

  1. Why Aether? See vision.md

    • Solves: "building distributed, event-sourced systems in Go without heavyweight frameworks"
    • Principles: Primitives over frameworks, NATS-native, resource-conscious
  2. What are we building? See CAPABILITIES.md

    • 9 capabilities organized into 3 groups (event sourcing, cluster, event distribution)
    • Each eliminates a pain point and enables a job
  3. How much work? See BACKLOG_QUICK_REFERENCE.md

    • 67 issues in 4 phases
    • Critical path: 13 P0 issues for MVP (6 weeks aggressive)
    • Full scope: all 67 issues (10 weeks typical)
  4. Value timeline?

    • After Phase 1: Event sourcing with conflict detection
    • After Phase 2: Local pub/sub and filtering
    • After Phase 3: Distributed cluster with automatic recovery
    • After Phase 4: Multi-tenant NATS-native delivery

Slides: Reference CAPABILITIES.md success conditions, value map


Scenario 6: "I found a bug in existing code. Which issues cover this area?"

Use dependency graph in BACKLOG_QUICK_REFERENCE.md

Example: "SaveEvent isn't enforcing version validation" → Look for: Issue 1.1, 1.2, 1.4 → Read: DOMAIN_MODEL_EVENT_SOURCING.md, monotonic version invariant → Fix: Implement version check in SaveEvent


Issue Numbering Scheme

Format: {Phase}.{FeatureSet}.{Issue}

  • Phase: 1-4 (Event Sourcing, Event Bus, Cluster, Namespace/NATS)
  • FeatureSet: a-z (subgrouping within phase)
  • Issue: 1-N (individual work item)

Examples:

  • 1.1 = Phase 1, Feature Set 1a (Event Storage), Issue 1
  • 3.13 = Phase 3, Feature Set 3c (Failure Recovery), Issue 6
  • 4.5 = Phase 4, Feature Set 4b (NATS Delivery), Issue 1

Issue Types

Each issue has a type that indicates what kind of work:

Type Example Time Estimate
Command SaveEvent, Subscribe 2-5 days
Rule Enforce append-only, fail-fast 1-3 days
Event Publish EventStored, LeaderElected 1-2 days
Query GetEvents, GetLeader 2-3 days
Interface SnapshotStore contract 1 day
Validation Namespace format checks 1 day
Documentation Retry patterns, cluster migration 2-5 days

Priority Levels

Level Meaning Approach
P0 Blocking; no alternative path Must complete before next items
P1 Important; ship without but limited value Complete after P0
P2 Nice-to-have; polish, observability Complete if time allows

Recommendation: Focus on P0 issues first. They're blocking; P1 issues may be parallelizable.


Issue Status Tracking

Not yet in Gitea. Use this backlog to:

  1. Create issues with /issue-writing skill
  2. Set up dependencies in Gitea (tea issues deps add)
  3. Track progress per phase
  4. Measure velocity (issues/week)

Suggested milestone structure:

  • Milestone 1: Phase 1 (Event Sourcing Foundation)
  • Milestone 2: Phase 2 (Local Event Bus)
  • Milestone 3: Phase 3 (Cluster Coordination)
  • Milestone 4: Phase 4 (Namespace & NATS)

Context at a Glance

Context 1: Event Sourcing

  • Issues: 1.1-1.10 (foundational)
  • Key Invariant: Monotonic versions per actor
  • Key Command: SaveEvent(event)
  • Key Query: GetLatestVersion, GetEvents
  • What it enables: Immutable history, replay, OCC

Context 2: Optimistic Concurrency Control

  • Issues: 1.11-1.12
  • Key Invariant: Conflicts detected immediately
  • Key Command: AttemptWrite (via SaveEvent)
  • Key Error: VersionConflictError with context
  • What it enables: Multi-writer safety without locks

Context 3: Event Bus (Local)

  • Issues: 2.1-2.9
  • Key Invariant: Exact subscriptions isolated; non-blocking delivery
  • Key Commands: Publish, Subscribe
  • Key Queries: GetSubscriptions, metrics
  • What it enables: Local pub/sub, loose coupling

Context 4: Namespace Isolation

  • Issues: 4.1-4.4
  • Key Invariant: Events from namespace X invisible to Y
  • Key Mechanism: Stream prefixing ("tenant-a_events")
  • What it enables: Multi-tenancy, logical boundaries

Context 5: Cluster Coordination

  • Issues: 3.1-3.17
  • Key Invariants: Single leader, no orphaned shards, no lost actors
  • Key Commands: JoinCluster, ElectLeader, RebalanceShards
  • Key Queries: GetLeader, GetShardAssignments
  • What it enables: Distributed deployment, HA, auto-recovery

Context 6: Event Bus (NATS)

  • Issues: 4.5-4.8
  • Key Invariant: Exactly-once cross-node delivery
  • Key Mechanism: NATS subjects, JetStream consumers
  • What it enables: Cross-node pub/sub, durability

Dependency Rules

Golden rule: Never implement an issue until its dependencies are complete.

Check dependencies:

  1. See issue detail in BACKLOG.md
  2. Look at "Dependencies" section
  3. Verify blockers are done
  4. Update status as you progress

Example: To implement Issue 3.13 (RebalanceShards):

  • ✓ Must have: Issue 3.8 (Consistent hashing)
  • ✓ Must have: Issue 3.12 (Health checks)
  • Then: Can implement 3.13
  • Then: Can implement 3.14-3.17 (validation, events)

Estimating Work

This backlog does NOT include time estimates (hours/days). Reasoning:

  • Estimates are team-specific (experienced Go team vs. first-time)
  • Estimates can bias priority (easier wins first, not highest value)
  • Better to track velocity (issues/week) after a few sprints

For planning, use:

  • Story point ballpark: 2 (small work), 3 (medium), 5 (complex), 8 (very complex)
  • Typical issue: 2-5 story points
  • Range for Phase 1: 30-50 points
  • Range for full backlog: 150-250 points

Adjust based on team experience with:

  • Distributed systems
  • Go (language learning curve minimal)
  • Event sourcing (paradigm shift; budget time for learning)
  • NATS (simple; learning curve 1-2 weeks)

Minimum viable team:

  • 1 senior architect (domain design, tricky decisions)
  • 2 engineers (implementation, tests)
  • 1 DevOps/infra (NATS setup, integration tests)

Ideal team:

  • 1 tech lead (architecture, guidance, code review)
  • 3-4 engineers (parallel implementation)
  • 1 QA (integration tests, failure scenarios)
  • 1 DevOps (NATS, cluster setup, monitoring)

Phase-by-phase staffing:

  • Phase 1: 2 engineers (sequential, learning curve)
  • Phase 2: 2-3 engineers (parallelizable)
  • Phase 3: 3-4 engineers (complex, needs multi-node testing)
  • Phase 4: 2 engineers (NATS integration, can overlap Phase 3)

Risks & Mitigation

Risk Impact Mitigation
Distributed systems unfamiliar High Spike on patterns, pair programming
Event sourcing complexity High Start with simple aggregates, read DOMAIN_MODEL_EVENT_SOURCING.md
NATS learning curve Medium Team pair with NATS expert, use existing integrations
Multi-node testing Medium Use Docker Compose for local cluster, integration tests first
Snapshot strategy Low Start simple (no snapshots), optimize later
Schema evolution Low Document event versioning strategy early

Success Criteria (Big Picture)

Phase 1 complete: Developers can build event-sourced actors with OCC, no concurrent write bugs

Phase 2 complete: Developers can decouple components via local pub/sub, filter events

Phase 3 complete: Team can deploy distributed cluster, shards rebalance on node failure

Phase 4 complete: Multi-tenant SaaS can use Aether with complete isolation, events durable across cluster


Next Steps

  1. Triage: Review backlog with team, adjust priorities
  2. Create issues: Use /issue-writing skill to populate Gitea
  3. Set dependencies: Use tea issues deps add to link blockers
  4. Plan Phase 1: Create sprint, assign issues, start
  5. Monitor: Track velocity, adjust Phase 2 plan

Getting Help

Questions about this backlog?

Questions about requirements?

Questions about strategy?


Document Version: 1.0 Last Updated: 2026-01-12 Backlog Status: Ready for Gitea import Approval Pending: Architecture review, team estimation