# Aether Executable Backlog: Index & Navigation **Date Generated:** 2026-01-12 **Total Issues:** 67 **Total Capabilities:** 9 **Total Bounded Contexts:** 5 **Total Phases:** 4 --- ## Quick Start **For busy decision-makers:** 1. Read: [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) (5 min) 2. See: Critical path shows 13 P0 issues for MVP 3. Plan: 4 phases, ~10 weeks for full scope, ~6 weeks for critical path **For engineers:** 1. Read: [`BACKLOG.md`](./BACKLOG.md) (comprehensive, 2600+ lines) 2. Pick: Phase 1 issues (foundation, no dependencies) 3. Check: Issue details for acceptance criteria, test cases, DDD guidance **For architects:** 1. Review: [`CAPABILITIES.md`](./CAPABILITIES.md) - Product capabilities mapped to domain 2. Read: [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) - Context boundaries and isolation 3. Study: [`DOMAIN_MODEL_*.md`](./DOMAIN_MODEL_SUMMARY.md) - Domain concepts and invariants --- ## Document Map ### Backlog Documents (Start Here) | Document | Purpose | Audience | Length | |----------|---------|----------|--------| | **BACKLOG.md** | Complete executable backlog with all 67 issues | Engineers, PMs | 2600 lines | | **BACKLOG_QUICK_REFERENCE.md** | Tables, dependency graph, metrics | Quick lookup | 300 lines | | **BACKLOG_INDEX.md** | This file - navigation guide | Everyone | 400 lines | ### Domain & Strategy (Background) | Document | Purpose | Audience | When to Read | |----------|---------|----------|--------------| | [`CAPABILITIES.md`](./CAPABILITIES.md) | 9 capabilities mapped to domain, value, success conditions | Architects, PMs | Before implementing | | [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) | 5 contexts: isolation rules, language, lifecycle | Architects, Seniors | During design review | | [`STRATEGY_CHAIN.md`](./STRATEGY_CHAIN.md) | Manifesto → Vision → Problem Space → Domains → Capabilities | Decision-makers | To understand "why" | | [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) | Event storming: user journeys, decisions, events | Product, Architects | Before Phase 1 | ### Domain Models (Technical Reference) | Document | Purpose | Scope | |----------|---------|-------| | [`DOMAIN_MODEL_SUMMARY.md`](./DOMAIN_MODEL_SUMMARY.md) | 1-page overview of all domain models | All 5 contexts | | [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) | Event Sourcing context (aggregates, commands, events, invariants) | Deep dive: Context 1 | | [`DOMAIN_MODEL_OCC.md`](./DOMAIN_MODEL_OCC.md) | Optimistic Concurrency context | Deep dive: Context 2 | | [`DOMAIN_MODEL_NAMESPACE_ISOLATION.md`](./DOMAIN_MODEL_NAMESPACE_ISOLATION.md) | Namespace Isolation context | Deep dive: Context 4 | | [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) | Event Bus + Cluster coordination contexts (Contexts 3, 5) | Integrated view | ### Cluster Documentation | Document | Purpose | When to Read | |----------|---------|--------------| | [`cluster/DOMAIN_MODEL.md`](./cluster/DOMAIN_MODEL.md) | Cluster coordination domain model (aggregates, commands, events) | Phase 3 | | [`cluster/ARCHITECTURE.md`](./cluster/ARCHITECTURE.md) | Cluster architecture (leader election, shards, failure recovery) | Phase 3 planning | | [`cluster/PATTERNS.md`](./cluster/PATTERNS.md) | Distributed patterns used in cluster coordination | Phase 3 implementation | --- ## How to Use This Backlog ### Scenario 1: "I need to build this. Where do I start?" 1. Read [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) (5 min) 2. Focus on **Phase 1** (17 issues, foundation) 3. Start with **Issue 1.1** (SaveEvent) 4. Dependencies show what unblocks what **Go to:** [`BACKLOG.md`](./BACKLOG.md) for full details --- ### Scenario 2: "I need to understand the domain before coding" 1. Read [`CAPABILITIES.md`](./CAPABILITIES.md) (product value perspective) 2. Read [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) (user journeys and events) 3. Read [`DOMAIN_MODEL_SUMMARY.md`](./DOMAIN_MODEL_SUMMARY.md) (1-page overview) 4. Deep-dive into specific context models (DOMAIN_MODEL_*.md) **Go to:** [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) for Phase 1 focus --- ### Scenario 3: "I'm implementing Phase 1. What do I need to know?" **Phase 1 covers:** Event storage, replay, snapshots, OCC, retry patterns 1. **Start:** Issue 1.1 (SaveEvent with version validation) - Acceptance criteria tell you exactly what to build - DDD guidance explains the invariant (monotonic versions) - Test cases show edge cases 2. **Then:** Issues 1.2-1.5 (append-only, events, queries) - These depend on 1.1; implement in parallel where possible 3. **Learn:** Read [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) - Understand aggregates, commands, events, invariants - See how SaveEvent fits into the larger picture 4. **Check:** [`BACKLOG.md`](./BACKLOG.md), Issue 1.1, acceptance criteria - Concrete, testable, specific requirements **Go to:** [`BACKLOG.md`](./BACKLOG.md) Phase 1 section (line 48-300) --- ### Scenario 4: "I'm planning Phase 3 (Clustering). Help me understand the domain." **Phase 3 covers:** Node discovery, leader election, shard distribution, failure recovery 1. **Background:** Read [`cluster/DOMAIN_MODEL.md`](./cluster/DOMAIN_MODEL.md) - Aggregates: Cluster, LeadershipLease, ShardAssignment - Commands: JoinCluster, ElectLeader, RebalanceShards - Events: LeaderElected, NodeFailed, ShardMigrated - Invariants: single leader, no orphaned shards 2. **Architecture:** Read [`cluster/ARCHITECTURE.md`](./cluster/ARCHITECTURE.md) - How leader election works (lease-based, NATS heartbeats) - How consistent hashing minimizes reshuffling - How failures trigger rebalancing 3. **Patterns:** Read [`cluster/PATTERNS.md`](./cluster/PATTERNS.md) - Distributed consensus patterns - Health check patterns - Migration patterns 4. **Issues:** See [`BACKLOG.md`](./BACKLOG.md) Phase 3 (issues 3.1-3.17) - Decomposed into: topology, leadership, shards, failure recovery - Dependency order: discovery → election → assignment → health → rebalancing **Go to:** [`BACKLOG.md`](./BACKLOG.md) Phase 3 section (line 800-1200) --- ### Scenario 5: "I need to present this to stakeholders. What's the pitch?" **Key messages:** 1. **Why Aether?** See [`vision.md`](./vision.md) - Solves: "building distributed, event-sourced systems in Go without heavyweight frameworks" - Principles: Primitives over frameworks, NATS-native, resource-conscious 2. **What are we building?** See [`CAPABILITIES.md`](./CAPABILITIES.md) - 9 capabilities organized into 3 groups (event sourcing, cluster, event distribution) - Each eliminates a pain point and enables a job 3. **How much work?** See [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) - 67 issues in 4 phases - Critical path: 13 P0 issues for MVP (6 weeks aggressive) - Full scope: all 67 issues (10 weeks typical) 4. **Value timeline?** - After Phase 1: Event sourcing with conflict detection - After Phase 2: Local pub/sub and filtering - After Phase 3: Distributed cluster with automatic recovery - After Phase 4: Multi-tenant NATS-native delivery **Slides:** Reference [`CAPABILITIES.md`](./CAPABILITIES.md) success conditions, value map --- ### Scenario 6: "I found a bug in existing code. Which issues cover this area?" **Use dependency graph** in [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) **Example:** "SaveEvent isn't enforcing version validation" → Look for: Issue 1.1, 1.2, 1.4 → Read: [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md), monotonic version invariant → Fix: Implement version check in SaveEvent --- ## Issue Numbering Scheme **Format:** `{Phase}.{FeatureSet}.{Issue}` - **Phase:** 1-4 (Event Sourcing, Event Bus, Cluster, Namespace/NATS) - **FeatureSet:** a-z (subgrouping within phase) - **Issue:** 1-N (individual work item) **Examples:** - `1.1` = Phase 1, Feature Set 1a (Event Storage), Issue 1 - `3.13` = Phase 3, Feature Set 3c (Failure Recovery), Issue 6 - `4.5` = Phase 4, Feature Set 4b (NATS Delivery), Issue 1 --- ## Issue Types Each issue has a type that indicates what kind of work: | Type | Example | Time Estimate | |------|---------|----------------| | **Command** | SaveEvent, Subscribe | 2-5 days | | **Rule** | Enforce append-only, fail-fast | 1-3 days | | **Event** | Publish EventStored, LeaderElected | 1-2 days | | **Query** | GetEvents, GetLeader | 2-3 days | | **Interface** | SnapshotStore contract | 1 day | | **Validation** | Namespace format checks | 1 day | | **Documentation** | Retry patterns, cluster migration | 2-5 days | --- ## Priority Levels | Level | Meaning | Approach | |-------|---------|----------| | **P0** | Blocking; no alternative path | Must complete before next items | | **P1** | Important; ship without but limited value | Complete after P0 | | **P2** | Nice-to-have; polish, observability | Complete if time allows | **Recommendation:** Focus on P0 issues first. They're blocking; P1 issues may be parallelizable. --- ## Issue Status Tracking **Not yet in Gitea.** Use this backlog to: 1. Create issues with `/issue-writing` skill 2. Set up dependencies in Gitea (tea issues deps add) 3. Track progress per phase 4. Measure velocity (issues/week) **Suggested milestone structure:** - Milestone 1: Phase 1 (Event Sourcing Foundation) - Milestone 2: Phase 2 (Local Event Bus) - Milestone 3: Phase 3 (Cluster Coordination) - Milestone 4: Phase 4 (Namespace & NATS) --- ## Context at a Glance ### Context 1: Event Sourcing - **Issues:** 1.1-1.10 (foundational) - **Key Invariant:** Monotonic versions per actor - **Key Command:** SaveEvent(event) - **Key Query:** GetLatestVersion, GetEvents - **What it enables:** Immutable history, replay, OCC ### Context 2: Optimistic Concurrency Control - **Issues:** 1.11-1.12 - **Key Invariant:** Conflicts detected immediately - **Key Command:** AttemptWrite (via SaveEvent) - **Key Error:** VersionConflictError with context - **What it enables:** Multi-writer safety without locks ### Context 3: Event Bus (Local) - **Issues:** 2.1-2.9 - **Key Invariant:** Exact subscriptions isolated; non-blocking delivery - **Key Commands:** Publish, Subscribe - **Key Queries:** GetSubscriptions, metrics - **What it enables:** Local pub/sub, loose coupling ### Context 4: Namespace Isolation - **Issues:** 4.1-4.4 - **Key Invariant:** Events from namespace X invisible to Y - **Key Mechanism:** Stream prefixing ("tenant-a_events") - **What it enables:** Multi-tenancy, logical boundaries ### Context 5: Cluster Coordination - **Issues:** 3.1-3.17 - **Key Invariants:** Single leader, no orphaned shards, no lost actors - **Key Commands:** JoinCluster, ElectLeader, RebalanceShards - **Key Queries:** GetLeader, GetShardAssignments - **What it enables:** Distributed deployment, HA, auto-recovery ### Context 6: Event Bus (NATS) - **Issues:** 4.5-4.8 - **Key Invariant:** Exactly-once cross-node delivery - **Key Mechanism:** NATS subjects, JetStream consumers - **What it enables:** Cross-node pub/sub, durability --- ## Dependency Rules **Golden rule:** Never implement an issue until its dependencies are complete. **Check dependencies:** 1. See issue detail in BACKLOG.md 2. Look at "Dependencies" section 3. Verify blockers are done 4. Update status as you progress **Example:** To implement Issue 3.13 (RebalanceShards): - ✓ Must have: Issue 3.8 (Consistent hashing) - ✓ Must have: Issue 3.12 (Health checks) - Then: Can implement 3.13 - Then: Can implement 3.14-3.17 (validation, events) --- ## Estimating Work This backlog does NOT include time estimates (hours/days). Reasoning: - Estimates are team-specific (experienced Go team vs. first-time) - Estimates can bias priority (easier wins first, not highest value) - Better to track velocity (issues/week) after a few sprints **For planning, use:** - **Story point ballpark:** 2 (small work), 3 (medium), 5 (complex), 8 (very complex) - **Typical issue:** 2-5 story points - **Range for Phase 1:** 30-50 points - **Range for full backlog:** 150-250 points **Adjust based on team experience with:** - Distributed systems - Go (language learning curve minimal) - Event sourcing (paradigm shift; budget time for learning) - NATS (simple; learning curve 1-2 weeks) --- ## Recommended Team Structure **Minimum viable team:** - 1 senior architect (domain design, tricky decisions) - 2 engineers (implementation, tests) - 1 DevOps/infra (NATS setup, integration tests) **Ideal team:** - 1 tech lead (architecture, guidance, code review) - 3-4 engineers (parallel implementation) - 1 QA (integration tests, failure scenarios) - 1 DevOps (NATS, cluster setup, monitoring) **Phase-by-phase staffing:** - Phase 1: 2 engineers (sequential, learning curve) - Phase 2: 2-3 engineers (parallelizable) - Phase 3: 3-4 engineers (complex, needs multi-node testing) - Phase 4: 2 engineers (NATS integration, can overlap Phase 3) --- ## Risks & Mitigation | Risk | Impact | Mitigation | |------|--------|-----------| | Distributed systems unfamiliar | High | Spike on patterns, pair programming | | Event sourcing complexity | High | Start with simple aggregates, read DOMAIN_MODEL_EVENT_SOURCING.md | | NATS learning curve | Medium | Team pair with NATS expert, use existing integrations | | Multi-node testing | Medium | Use Docker Compose for local cluster, integration tests first | | Snapshot strategy | Low | Start simple (no snapshots), optimize later | | Schema evolution | Low | Document event versioning strategy early | --- ## Success Criteria (Big Picture) **Phase 1 complete:** Developers can build event-sourced actors with OCC, no concurrent write bugs **Phase 2 complete:** Developers can decouple components via local pub/sub, filter events **Phase 3 complete:** Team can deploy distributed cluster, shards rebalance on node failure **Phase 4 complete:** Multi-tenant SaaS can use Aether with complete isolation, events durable across cluster --- ## Next Steps 1. **Triage:** Review backlog with team, adjust priorities 2. **Create issues:** Use [`/issue-writing`](./BACKLOG.md) skill to populate Gitea 3. **Set dependencies:** Use `tea issues deps add` to link blockers 4. **Plan Phase 1:** Create sprint, assign issues, start 5. **Monitor:** Track velocity, adjust Phase 2 plan --- ## Getting Help **Questions about this backlog?** - Issue detail: See [`BACKLOG.md`](./BACKLOG.md) - Quick lookup: See [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) - Domain concepts: See [`DOMAIN_MODEL_*.md`](./DOMAIN_MODEL_SUMMARY.md) **Questions about requirements?** - Product value: See [`CAPABILITIES.md`](./CAPABILITIES.md) - User context: See [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) - Vision: See [`vision.md`](./vision.md) **Questions about strategy?** - How we got here: See [`STRATEGY_CHAIN.md`](./STRATEGY_CHAIN.md) - Organization context: See [Flowmade Manifesto](https://git.flowmade.one/flowmade-one/architecture/src/branch/main/manifesto.md) --- **Document Version:** 1.0 **Last Updated:** 2026-01-12 **Backlog Status:** Ready for Gitea import **Approval Pending:** Architecture review, team estimation