Move product strategy documentation to .product-strategy directory

Organize all product strategy and domain modeling documentation into a dedicated .product-strategy directory for better separation from code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-12 23:57:11 +01:00
parent 18ea677585
commit 271f5db444
26 changed files with 16521 additions and 0 deletions
--- a/.product-strategy/BACKLOG_INDEX.md
+++ b/.product-strategy/BACKLOG_INDEX.md
@@ -0,0 +1,403 @@
+# Aether Executable Backlog: Index & Navigation
+
+**Date Generated:** 2026-01-12
+**Total Issues:** 67
+**Total Capabilities:** 9
+**Total Bounded Contexts:** 5
+**Total Phases:** 4
+
+---
+
+## Quick Start
+
+**For busy decision-makers:**
+1. Read: [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) (5 min)
+2. See: Critical path shows 13 P0 issues for MVP
+3. Plan: 4 phases, ~10 weeks for full scope, ~6 weeks for critical path
+
+**For engineers:**
+1. Read: [`BACKLOG.md`](./BACKLOG.md) (comprehensive, 2600+ lines)
+2. Pick: Phase 1 issues (foundation, no dependencies)
+3. Check: Issue details for acceptance criteria, test cases, DDD guidance
+
+**For architects:**
+1. Review: [`CAPABILITIES.md`](./CAPABILITIES.md) - Product capabilities mapped to domain
+2. Read: [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) - Context boundaries and isolation
+3. Study: [`DOMAIN_MODEL_*.md`](./DOMAIN_MODEL_SUMMARY.md) - Domain concepts and invariants
+
+---
+
+## Document Map
+
+### Backlog Documents (Start Here)
+
+| Document | Purpose | Audience | Length |
+|----------|---------|----------|--------|
+| **BACKLOG.md** | Complete executable backlog with all 67 issues | Engineers, PMs | 2600 lines |
+| **BACKLOG_QUICK_REFERENCE.md** | Tables, dependency graph, metrics | Quick lookup | 300 lines |
+| **BACKLOG_INDEX.md** | This file - navigation guide | Everyone | 400 lines |
+
+### Domain & Strategy (Background)
+
+| Document | Purpose | Audience | When to Read |
+|----------|---------|----------|--------------|
+| [`CAPABILITIES.md`](./CAPABILITIES.md) | 9 capabilities mapped to domain, value, success conditions | Architects, PMs | Before implementing |
+| [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) | 5 contexts: isolation rules, language, lifecycle | Architects, Seniors | During design review |
+| [`STRATEGY_CHAIN.md`](./STRATEGY_CHAIN.md) | Manifesto → Vision → Problem Space → Domains → Capabilities | Decision-makers | To understand "why" |
+| [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) | Event storming: user journeys, decisions, events | Product, Architects | Before Phase 1 |
+
+### Domain Models (Technical Reference)
+
+| Document | Purpose | Scope |
+|----------|---------|-------|
+| [`DOMAIN_MODEL_SUMMARY.md`](./DOMAIN_MODEL_SUMMARY.md) | 1-page overview of all domain models | All 5 contexts |
+| [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) | Event Sourcing context (aggregates, commands, events, invariants) | Deep dive: Context 1 |
+| [`DOMAIN_MODEL_OCC.md`](./DOMAIN_MODEL_OCC.md) | Optimistic Concurrency context | Deep dive: Context 2 |
+| [`DOMAIN_MODEL_NAMESPACE_ISOLATION.md`](./DOMAIN_MODEL_NAMESPACE_ISOLATION.md) | Namespace Isolation context | Deep dive: Context 4 |
+| [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) | Event Bus + Cluster coordination contexts (Contexts 3, 5) | Integrated view |
+
+### Cluster Documentation
+
+| Document | Purpose | When to Read |
+|----------|---------|--------------|
+| [`cluster/DOMAIN_MODEL.md`](./cluster/DOMAIN_MODEL.md) | Cluster coordination domain model (aggregates, commands, events) | Phase 3 |
+| [`cluster/ARCHITECTURE.md`](./cluster/ARCHITECTURE.md) | Cluster architecture (leader election, shards, failure recovery) | Phase 3 planning |
+| [`cluster/PATTERNS.md`](./cluster/PATTERNS.md) | Distributed patterns used in cluster coordination | Phase 3 implementation |
+
+---
+
+## How to Use This Backlog
+
+### Scenario 1: "I need to build this. Where do I start?"
+
+1. Read [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) (5 min)
+2. Focus on **Phase 1** (17 issues, foundation)
+3. Start with **Issue 1.1** (SaveEvent)
+4. Dependencies show what unblocks what
+
+**Go to:** [`BACKLOG.md`](./BACKLOG.md) for full details
+
+---
+
+### Scenario 2: "I need to understand the domain before coding"
+
+1. Read [`CAPABILITIES.md`](./CAPABILITIES.md) (product value perspective)
+2. Read [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) (user journeys and events)
+3. Read [`DOMAIN_MODEL_SUMMARY.md`](./DOMAIN_MODEL_SUMMARY.md) (1-page overview)
+4. Deep-dive into specific context models (DOMAIN_MODEL_*.md)
+
+**Go to:** [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) for Phase 1 focus
+
+---
+
+### Scenario 3: "I'm implementing Phase 1. What do I need to know?"
+
+**Phase 1 covers:** Event storage, replay, snapshots, OCC, retry patterns
+
+1. **Start:** Issue 1.1 (SaveEvent with version validation)
+   - Acceptance criteria tell you exactly what to build
+   - DDD guidance explains the invariant (monotonic versions)
+   - Test cases show edge cases
+
+2. **Then:** Issues 1.2-1.5 (append-only, events, queries)
+   - These depend on 1.1; implement in parallel where possible
+
+3. **Learn:** Read [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md)
+   - Understand aggregates, commands, events, invariants
+   - See how SaveEvent fits into the larger picture
+
+4. **Check:** [`BACKLOG.md`](./BACKLOG.md), Issue 1.1, acceptance criteria
+   - Concrete, testable, specific requirements
+
+**Go to:** [`BACKLOG.md`](./BACKLOG.md) Phase 1 section (line 48-300)
+
+---
+
+### Scenario 4: "I'm planning Phase 3 (Clustering). Help me understand the domain."
+
+**Phase 3 covers:** Node discovery, leader election, shard distribution, failure recovery
+
+1. **Background:** Read [`cluster/DOMAIN_MODEL.md`](./cluster/DOMAIN_MODEL.md)
+   - Aggregates: Cluster, LeadershipLease, ShardAssignment
+   - Commands: JoinCluster, ElectLeader, RebalanceShards
+   - Events: LeaderElected, NodeFailed, ShardMigrated
+   - Invariants: single leader, no orphaned shards
+
+2. **Architecture:** Read [`cluster/ARCHITECTURE.md`](./cluster/ARCHITECTURE.md)
+   - How leader election works (lease-based, NATS heartbeats)
+   - How consistent hashing minimizes reshuffling
+   - How failures trigger rebalancing
+
+3. **Patterns:** Read [`cluster/PATTERNS.md`](./cluster/PATTERNS.md)
+   - Distributed consensus patterns
+   - Health check patterns
+   - Migration patterns
+
+4. **Issues:** See [`BACKLOG.md`](./BACKLOG.md) Phase 3 (issues 3.1-3.17)
+   - Decomposed into: topology, leadership, shards, failure recovery
+   - Dependency order: discovery → election → assignment → health → rebalancing
+
+**Go to:** [`BACKLOG.md`](./BACKLOG.md) Phase 3 section (line 800-1200)
+
+---
+
+### Scenario 5: "I need to present this to stakeholders. What's the pitch?"
+
+**Key messages:**
+
+1. **Why Aether?** See [`vision.md`](./vision.md)
+   - Solves: "building distributed, event-sourced systems in Go without heavyweight frameworks"
+   - Principles: Primitives over frameworks, NATS-native, resource-conscious
+
+2. **What are we building?** See [`CAPABILITIES.md`](./CAPABILITIES.md)
+   - 9 capabilities organized into 3 groups (event sourcing, cluster, event distribution)
+   - Each eliminates a pain point and enables a job
+
+3. **How much work?** See [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md)
+   - 67 issues in 4 phases
+   - Critical path: 13 P0 issues for MVP (6 weeks aggressive)
+   - Full scope: all 67 issues (10 weeks typical)
+
+4. **Value timeline?**
+   - After Phase 1: Event sourcing with conflict detection
+   - After Phase 2: Local pub/sub and filtering
+   - After Phase 3: Distributed cluster with automatic recovery
+   - After Phase 4: Multi-tenant NATS-native delivery
+
+**Slides:** Reference [`CAPABILITIES.md`](./CAPABILITIES.md) success conditions, value map
+
+---
+
+### Scenario 6: "I found a bug in existing code. Which issues cover this area?"
+
+**Use dependency graph** in [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md)
+
+**Example:** "SaveEvent isn't enforcing version validation"
+→ Look for: Issue 1.1, 1.2, 1.4
+→ Read: [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md), monotonic version invariant
+→ Fix: Implement version check in SaveEvent
+
+---
+
+## Issue Numbering Scheme
+
+**Format:** `{Phase}.{FeatureSet}.{Issue}`
+
+- **Phase:** 1-4 (Event Sourcing, Event Bus, Cluster, Namespace/NATS)
+- **FeatureSet:** a-z (subgrouping within phase)
+- **Issue:** 1-N (individual work item)
+
+**Examples:**
+- `1.1` = Phase 1, Feature Set 1a (Event Storage), Issue 1
+- `3.13` = Phase 3, Feature Set 3c (Failure Recovery), Issue 6
+- `4.5` = Phase 4, Feature Set 4b (NATS Delivery), Issue 1
+
+---
+
+## Issue Types
+
+Each issue has a type that indicates what kind of work:
+
+| Type | Example | Time Estimate |
+|------|---------|----------------|
+| **Command** | SaveEvent, Subscribe | 2-5 days |
+| **Rule** | Enforce append-only, fail-fast | 1-3 days |
+| **Event** | Publish EventStored, LeaderElected | 1-2 days |
+| **Query** | GetEvents, GetLeader | 2-3 days |
+| **Interface** | SnapshotStore contract | 1 day |
+| **Validation** | Namespace format checks | 1 day |
+| **Documentation** | Retry patterns, cluster migration | 2-5 days |
+
+---
+
+## Priority Levels
+
+| Level | Meaning | Approach |
+|-------|---------|----------|
+| **P0** | Blocking; no alternative path | Must complete before next items |
+| **P1** | Important; ship without but limited value | Complete after P0 |
+| **P2** | Nice-to-have; polish, observability | Complete if time allows |
+
+**Recommendation:** Focus on P0 issues first. They're blocking; P1 issues may be parallelizable.
+
+---
+
+## Issue Status Tracking
+
+**Not yet in Gitea.** Use this backlog to:
+
+1. Create issues with `/issue-writing` skill
+2. Set up dependencies in Gitea (tea issues deps add)
+3. Track progress per phase
+4. Measure velocity (issues/week)
+
+**Suggested milestone structure:**
+- Milestone 1: Phase 1 (Event Sourcing Foundation)
+- Milestone 2: Phase 2 (Local Event Bus)
+- Milestone 3: Phase 3 (Cluster Coordination)
+- Milestone 4: Phase 4 (Namespace & NATS)
+
+---
+
+## Context at a Glance
+
+### Context 1: Event Sourcing
+- **Issues:** 1.1-1.10 (foundational)
+- **Key Invariant:** Monotonic versions per actor
+- **Key Command:** SaveEvent(event)
+- **Key Query:** GetLatestVersion, GetEvents
+- **What it enables:** Immutable history, replay, OCC
+
+### Context 2: Optimistic Concurrency Control
+- **Issues:** 1.11-1.12
+- **Key Invariant:** Conflicts detected immediately
+- **Key Command:** AttemptWrite (via SaveEvent)
+- **Key Error:** VersionConflictError with context
+- **What it enables:** Multi-writer safety without locks
+
+### Context 3: Event Bus (Local)
+- **Issues:** 2.1-2.9
+- **Key Invariant:** Exact subscriptions isolated; non-blocking delivery
+- **Key Commands:** Publish, Subscribe
+- **Key Queries:** GetSubscriptions, metrics
+- **What it enables:** Local pub/sub, loose coupling
+
+### Context 4: Namespace Isolation
+- **Issues:** 4.1-4.4
+- **Key Invariant:** Events from namespace X invisible to Y
+- **Key Mechanism:** Stream prefixing ("tenant-a_events")
+- **What it enables:** Multi-tenancy, logical boundaries
+
+### Context 5: Cluster Coordination
+- **Issues:** 3.1-3.17
+- **Key Invariants:** Single leader, no orphaned shards, no lost actors
+- **Key Commands:** JoinCluster, ElectLeader, RebalanceShards
+- **Key Queries:** GetLeader, GetShardAssignments
+- **What it enables:** Distributed deployment, HA, auto-recovery
+
+### Context 6: Event Bus (NATS)
+- **Issues:** 4.5-4.8
+- **Key Invariant:** Exactly-once cross-node delivery
+- **Key Mechanism:** NATS subjects, JetStream consumers
+- **What it enables:** Cross-node pub/sub, durability
+
+---
+
+## Dependency Rules
+
+**Golden rule:** Never implement an issue until its dependencies are complete.
+
+**Check dependencies:**
+1. See issue detail in BACKLOG.md
+2. Look at "Dependencies" section
+3. Verify blockers are done
+4. Update status as you progress
+
+**Example:** To implement Issue 3.13 (RebalanceShards):
+- ✓ Must have: Issue 3.8 (Consistent hashing)
+- ✓ Must have: Issue 3.12 (Health checks)
+- Then: Can implement 3.13
+- Then: Can implement 3.14-3.17 (validation, events)
+
+---
+
+## Estimating Work
+
+This backlog does NOT include time estimates (hours/days). Reasoning:
+
+- Estimates are team-specific (experienced Go team vs. first-time)
+- Estimates can bias priority (easier wins first, not highest value)
+- Better to track velocity (issues/week) after a few sprints
+
+**For planning, use:**
+- **Story point ballpark:** 2 (small work), 3 (medium), 5 (complex), 8 (very complex)
+- **Typical issue:** 2-5 story points
+- **Range for Phase 1:** 30-50 points
+- **Range for full backlog:** 150-250 points
+
+**Adjust based on team experience with:**
+- Distributed systems
+- Go (language learning curve minimal)
+- Event sourcing (paradigm shift; budget time for learning)
+- NATS (simple; learning curve 1-2 weeks)
+
+---
+
+## Recommended Team Structure
+
+**Minimum viable team:**
+- 1 senior architect (domain design, tricky decisions)
+- 2 engineers (implementation, tests)
+- 1 DevOps/infra (NATS setup, integration tests)
+
+**Ideal team:**
+- 1 tech lead (architecture, guidance, code review)
+- 3-4 engineers (parallel implementation)
+- 1 QA (integration tests, failure scenarios)
+- 1 DevOps (NATS, cluster setup, monitoring)
+
+**Phase-by-phase staffing:**
+- Phase 1: 2 engineers (sequential, learning curve)
+- Phase 2: 2-3 engineers (parallelizable)
+- Phase 3: 3-4 engineers (complex, needs multi-node testing)
+- Phase 4: 2 engineers (NATS integration, can overlap Phase 3)
+
+---
+
+## Risks & Mitigation
+
+| Risk | Impact | Mitigation |
+|------|--------|-----------|
+| Distributed systems unfamiliar | High | Spike on patterns, pair programming |
+| Event sourcing complexity | High | Start with simple aggregates, read DOMAIN_MODEL_EVENT_SOURCING.md |
+| NATS learning curve | Medium | Team pair with NATS expert, use existing integrations |
+| Multi-node testing | Medium | Use Docker Compose for local cluster, integration tests first |
+| Snapshot strategy | Low | Start simple (no snapshots), optimize later |
+| Schema evolution | Low | Document event versioning strategy early |
+
+---
+
+## Success Criteria (Big Picture)
+
+**Phase 1 complete:** Developers can build event-sourced actors with OCC, no concurrent write bugs
+
+**Phase 2 complete:** Developers can decouple components via local pub/sub, filter events
+
+**Phase 3 complete:** Team can deploy distributed cluster, shards rebalance on node failure
+
+**Phase 4 complete:** Multi-tenant SaaS can use Aether with complete isolation, events durable across cluster
+
+---
+
+## Next Steps
+
+1. **Triage:** Review backlog with team, adjust priorities
+2. **Create issues:** Use [`/issue-writing`](./BACKLOG.md) skill to populate Gitea
+3. **Set dependencies:** Use `tea issues deps add` to link blockers
+4. **Plan Phase 1:** Create sprint, assign issues, start
+5. **Monitor:** Track velocity, adjust Phase 2 plan
+
+---
+
+## Getting Help
+
+**Questions about this backlog?**
+- Issue detail: See [`BACKLOG.md`](./BACKLOG.md)
+- Quick lookup: See [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md)
+- Domain concepts: See [`DOMAIN_MODEL_*.md`](./DOMAIN_MODEL_SUMMARY.md)
+
+**Questions about requirements?**
+- Product value: See [`CAPABILITIES.md`](./CAPABILITIES.md)
+- User context: See [`PROBLEM_MAP.md`](./PROBLEM_MAP.md)
+- Vision: See [`vision.md`](./vision.md)
+
+**Questions about strategy?**
+- How we got here: See [`STRATEGY_CHAIN.md`](./STRATEGY_CHAIN.md)
+- Organization context: See [Flowmade Manifesto](https://git.flowmade.one/flowmade-one/architecture/src/branch/main/manifesto.md)
+
+---
+
+**Document Version:** 1.0
+**Last Updated:** 2026-01-12
+**Backlog Status:** Ready for Gitea import
+**Approval Pending:** Architecture review, team estimation