Move product strategy documentation to .product-strategy directory
Some checks failed
CI / build (push) Successful in 21s
CI / integration (push) Failing after 2m1s

Organize all product strategy and domain modeling documentation into a
dedicated .product-strategy directory for better separation from code.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-12 23:57:11 +01:00
parent 18ea677585
commit 271f5db444
26 changed files with 16521 additions and 0 deletions

View File

@@ -0,0 +1,403 @@
# Aether Executable Backlog: Index & Navigation
**Date Generated:** 2026-01-12
**Total Issues:** 67
**Total Capabilities:** 9
**Total Bounded Contexts:** 5
**Total Phases:** 4
---
## Quick Start
**For busy decision-makers:**
1. Read: [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) (5 min)
2. See: Critical path shows 13 P0 issues for MVP
3. Plan: 4 phases, ~10 weeks for full scope, ~6 weeks for critical path
**For engineers:**
1. Read: [`BACKLOG.md`](./BACKLOG.md) (comprehensive, 2600+ lines)
2. Pick: Phase 1 issues (foundation, no dependencies)
3. Check: Issue details for acceptance criteria, test cases, DDD guidance
**For architects:**
1. Review: [`CAPABILITIES.md`](./CAPABILITIES.md) - Product capabilities mapped to domain
2. Read: [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) - Context boundaries and isolation
3. Study: [`DOMAIN_MODEL_*.md`](./DOMAIN_MODEL_SUMMARY.md) - Domain concepts and invariants
---
## Document Map
### Backlog Documents (Start Here)
| Document | Purpose | Audience | Length |
|----------|---------|----------|--------|
| **BACKLOG.md** | Complete executable backlog with all 67 issues | Engineers, PMs | 2600 lines |
| **BACKLOG_QUICK_REFERENCE.md** | Tables, dependency graph, metrics | Quick lookup | 300 lines |
| **BACKLOG_INDEX.md** | This file - navigation guide | Everyone | 400 lines |
### Domain & Strategy (Background)
| Document | Purpose | Audience | When to Read |
|----------|---------|----------|--------------|
| [`CAPABILITIES.md`](./CAPABILITIES.md) | 9 capabilities mapped to domain, value, success conditions | Architects, PMs | Before implementing |
| [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) | 5 contexts: isolation rules, language, lifecycle | Architects, Seniors | During design review |
| [`STRATEGY_CHAIN.md`](./STRATEGY_CHAIN.md) | Manifesto → Vision → Problem Space → Domains → Capabilities | Decision-makers | To understand "why" |
| [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) | Event storming: user journeys, decisions, events | Product, Architects | Before Phase 1 |
### Domain Models (Technical Reference)
| Document | Purpose | Scope |
|----------|---------|-------|
| [`DOMAIN_MODEL_SUMMARY.md`](./DOMAIN_MODEL_SUMMARY.md) | 1-page overview of all domain models | All 5 contexts |
| [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) | Event Sourcing context (aggregates, commands, events, invariants) | Deep dive: Context 1 |
| [`DOMAIN_MODEL_OCC.md`](./DOMAIN_MODEL_OCC.md) | Optimistic Concurrency context | Deep dive: Context 2 |
| [`DOMAIN_MODEL_NAMESPACE_ISOLATION.md`](./DOMAIN_MODEL_NAMESPACE_ISOLATION.md) | Namespace Isolation context | Deep dive: Context 4 |
| [`BOUNDED_CONTEXT_MAP.md`](./BOUNDED_CONTEXT_MAP.md) | Event Bus + Cluster coordination contexts (Contexts 3, 5) | Integrated view |
### Cluster Documentation
| Document | Purpose | When to Read |
|----------|---------|--------------|
| [`cluster/DOMAIN_MODEL.md`](./cluster/DOMAIN_MODEL.md) | Cluster coordination domain model (aggregates, commands, events) | Phase 3 |
| [`cluster/ARCHITECTURE.md`](./cluster/ARCHITECTURE.md) | Cluster architecture (leader election, shards, failure recovery) | Phase 3 planning |
| [`cluster/PATTERNS.md`](./cluster/PATTERNS.md) | Distributed patterns used in cluster coordination | Phase 3 implementation |
---
## How to Use This Backlog
### Scenario 1: "I need to build this. Where do I start?"
1. Read [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md) (5 min)
2. Focus on **Phase 1** (17 issues, foundation)
3. Start with **Issue 1.1** (SaveEvent)
4. Dependencies show what unblocks what
**Go to:** [`BACKLOG.md`](./BACKLOG.md) for full details
---
### Scenario 2: "I need to understand the domain before coding"
1. Read [`CAPABILITIES.md`](./CAPABILITIES.md) (product value perspective)
2. Read [`PROBLEM_MAP.md`](./PROBLEM_MAP.md) (user journeys and events)
3. Read [`DOMAIN_MODEL_SUMMARY.md`](./DOMAIN_MODEL_SUMMARY.md) (1-page overview)
4. Deep-dive into specific context models (DOMAIN_MODEL_*.md)
**Go to:** [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md) for Phase 1 focus
---
### Scenario 3: "I'm implementing Phase 1. What do I need to know?"
**Phase 1 covers:** Event storage, replay, snapshots, OCC, retry patterns
1. **Start:** Issue 1.1 (SaveEvent with version validation)
- Acceptance criteria tell you exactly what to build
- DDD guidance explains the invariant (monotonic versions)
- Test cases show edge cases
2. **Then:** Issues 1.2-1.5 (append-only, events, queries)
- These depend on 1.1; implement in parallel where possible
3. **Learn:** Read [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md)
- Understand aggregates, commands, events, invariants
- See how SaveEvent fits into the larger picture
4. **Check:** [`BACKLOG.md`](./BACKLOG.md), Issue 1.1, acceptance criteria
- Concrete, testable, specific requirements
**Go to:** [`BACKLOG.md`](./BACKLOG.md) Phase 1 section (line 48-300)
---
### Scenario 4: "I'm planning Phase 3 (Clustering). Help me understand the domain."
**Phase 3 covers:** Node discovery, leader election, shard distribution, failure recovery
1. **Background:** Read [`cluster/DOMAIN_MODEL.md`](./cluster/DOMAIN_MODEL.md)
- Aggregates: Cluster, LeadershipLease, ShardAssignment
- Commands: JoinCluster, ElectLeader, RebalanceShards
- Events: LeaderElected, NodeFailed, ShardMigrated
- Invariants: single leader, no orphaned shards
2. **Architecture:** Read [`cluster/ARCHITECTURE.md`](./cluster/ARCHITECTURE.md)
- How leader election works (lease-based, NATS heartbeats)
- How consistent hashing minimizes reshuffling
- How failures trigger rebalancing
3. **Patterns:** Read [`cluster/PATTERNS.md`](./cluster/PATTERNS.md)
- Distributed consensus patterns
- Health check patterns
- Migration patterns
4. **Issues:** See [`BACKLOG.md`](./BACKLOG.md) Phase 3 (issues 3.1-3.17)
- Decomposed into: topology, leadership, shards, failure recovery
- Dependency order: discovery → election → assignment → health → rebalancing
**Go to:** [`BACKLOG.md`](./BACKLOG.md) Phase 3 section (line 800-1200)
---
### Scenario 5: "I need to present this to stakeholders. What's the pitch?"
**Key messages:**
1. **Why Aether?** See [`vision.md`](./vision.md)
- Solves: "building distributed, event-sourced systems in Go without heavyweight frameworks"
- Principles: Primitives over frameworks, NATS-native, resource-conscious
2. **What are we building?** See [`CAPABILITIES.md`](./CAPABILITIES.md)
- 9 capabilities organized into 3 groups (event sourcing, cluster, event distribution)
- Each eliminates a pain point and enables a job
3. **How much work?** See [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md)
- 67 issues in 4 phases
- Critical path: 13 P0 issues for MVP (6 weeks aggressive)
- Full scope: all 67 issues (10 weeks typical)
4. **Value timeline?**
- After Phase 1: Event sourcing with conflict detection
- After Phase 2: Local pub/sub and filtering
- After Phase 3: Distributed cluster with automatic recovery
- After Phase 4: Multi-tenant NATS-native delivery
**Slides:** Reference [`CAPABILITIES.md`](./CAPABILITIES.md) success conditions, value map
---
### Scenario 6: "I found a bug in existing code. Which issues cover this area?"
**Use dependency graph** in [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md)
**Example:** "SaveEvent isn't enforcing version validation"
→ Look for: Issue 1.1, 1.2, 1.4
→ Read: [`DOMAIN_MODEL_EVENT_SOURCING.md`](./DOMAIN_MODEL_EVENT_SOURCING.md), monotonic version invariant
→ Fix: Implement version check in SaveEvent
---
## Issue Numbering Scheme
**Format:** `{Phase}.{FeatureSet}.{Issue}`
- **Phase:** 1-4 (Event Sourcing, Event Bus, Cluster, Namespace/NATS)
- **FeatureSet:** a-z (subgrouping within phase)
- **Issue:** 1-N (individual work item)
**Examples:**
- `1.1` = Phase 1, Feature Set 1a (Event Storage), Issue 1
- `3.13` = Phase 3, Feature Set 3c (Failure Recovery), Issue 6
- `4.5` = Phase 4, Feature Set 4b (NATS Delivery), Issue 1
---
## Issue Types
Each issue has a type that indicates what kind of work:
| Type | Example | Time Estimate |
|------|---------|----------------|
| **Command** | SaveEvent, Subscribe | 2-5 days |
| **Rule** | Enforce append-only, fail-fast | 1-3 days |
| **Event** | Publish EventStored, LeaderElected | 1-2 days |
| **Query** | GetEvents, GetLeader | 2-3 days |
| **Interface** | SnapshotStore contract | 1 day |
| **Validation** | Namespace format checks | 1 day |
| **Documentation** | Retry patterns, cluster migration | 2-5 days |
---
## Priority Levels
| Level | Meaning | Approach |
|-------|---------|----------|
| **P0** | Blocking; no alternative path | Must complete before next items |
| **P1** | Important; ship without but limited value | Complete after P0 |
| **P2** | Nice-to-have; polish, observability | Complete if time allows |
**Recommendation:** Focus on P0 issues first. They're blocking; P1 issues may be parallelizable.
---
## Issue Status Tracking
**Not yet in Gitea.** Use this backlog to:
1. Create issues with `/issue-writing` skill
2. Set up dependencies in Gitea (tea issues deps add)
3. Track progress per phase
4. Measure velocity (issues/week)
**Suggested milestone structure:**
- Milestone 1: Phase 1 (Event Sourcing Foundation)
- Milestone 2: Phase 2 (Local Event Bus)
- Milestone 3: Phase 3 (Cluster Coordination)
- Milestone 4: Phase 4 (Namespace & NATS)
---
## Context at a Glance
### Context 1: Event Sourcing
- **Issues:** 1.1-1.10 (foundational)
- **Key Invariant:** Monotonic versions per actor
- **Key Command:** SaveEvent(event)
- **Key Query:** GetLatestVersion, GetEvents
- **What it enables:** Immutable history, replay, OCC
### Context 2: Optimistic Concurrency Control
- **Issues:** 1.11-1.12
- **Key Invariant:** Conflicts detected immediately
- **Key Command:** AttemptWrite (via SaveEvent)
- **Key Error:** VersionConflictError with context
- **What it enables:** Multi-writer safety without locks
### Context 3: Event Bus (Local)
- **Issues:** 2.1-2.9
- **Key Invariant:** Exact subscriptions isolated; non-blocking delivery
- **Key Commands:** Publish, Subscribe
- **Key Queries:** GetSubscriptions, metrics
- **What it enables:** Local pub/sub, loose coupling
### Context 4: Namespace Isolation
- **Issues:** 4.1-4.4
- **Key Invariant:** Events from namespace X invisible to Y
- **Key Mechanism:** Stream prefixing ("tenant-a_events")
- **What it enables:** Multi-tenancy, logical boundaries
### Context 5: Cluster Coordination
- **Issues:** 3.1-3.17
- **Key Invariants:** Single leader, no orphaned shards, no lost actors
- **Key Commands:** JoinCluster, ElectLeader, RebalanceShards
- **Key Queries:** GetLeader, GetShardAssignments
- **What it enables:** Distributed deployment, HA, auto-recovery
### Context 6: Event Bus (NATS)
- **Issues:** 4.5-4.8
- **Key Invariant:** Exactly-once cross-node delivery
- **Key Mechanism:** NATS subjects, JetStream consumers
- **What it enables:** Cross-node pub/sub, durability
---
## Dependency Rules
**Golden rule:** Never implement an issue until its dependencies are complete.
**Check dependencies:**
1. See issue detail in BACKLOG.md
2. Look at "Dependencies" section
3. Verify blockers are done
4. Update status as you progress
**Example:** To implement Issue 3.13 (RebalanceShards):
- ✓ Must have: Issue 3.8 (Consistent hashing)
- ✓ Must have: Issue 3.12 (Health checks)
- Then: Can implement 3.13
- Then: Can implement 3.14-3.17 (validation, events)
---
## Estimating Work
This backlog does NOT include time estimates (hours/days). Reasoning:
- Estimates are team-specific (experienced Go team vs. first-time)
- Estimates can bias priority (easier wins first, not highest value)
- Better to track velocity (issues/week) after a few sprints
**For planning, use:**
- **Story point ballpark:** 2 (small work), 3 (medium), 5 (complex), 8 (very complex)
- **Typical issue:** 2-5 story points
- **Range for Phase 1:** 30-50 points
- **Range for full backlog:** 150-250 points
**Adjust based on team experience with:**
- Distributed systems
- Go (language learning curve minimal)
- Event sourcing (paradigm shift; budget time for learning)
- NATS (simple; learning curve 1-2 weeks)
---
## Recommended Team Structure
**Minimum viable team:**
- 1 senior architect (domain design, tricky decisions)
- 2 engineers (implementation, tests)
- 1 DevOps/infra (NATS setup, integration tests)
**Ideal team:**
- 1 tech lead (architecture, guidance, code review)
- 3-4 engineers (parallel implementation)
- 1 QA (integration tests, failure scenarios)
- 1 DevOps (NATS, cluster setup, monitoring)
**Phase-by-phase staffing:**
- Phase 1: 2 engineers (sequential, learning curve)
- Phase 2: 2-3 engineers (parallelizable)
- Phase 3: 3-4 engineers (complex, needs multi-node testing)
- Phase 4: 2 engineers (NATS integration, can overlap Phase 3)
---
## Risks & Mitigation
| Risk | Impact | Mitigation |
|------|--------|-----------|
| Distributed systems unfamiliar | High | Spike on patterns, pair programming |
| Event sourcing complexity | High | Start with simple aggregates, read DOMAIN_MODEL_EVENT_SOURCING.md |
| NATS learning curve | Medium | Team pair with NATS expert, use existing integrations |
| Multi-node testing | Medium | Use Docker Compose for local cluster, integration tests first |
| Snapshot strategy | Low | Start simple (no snapshots), optimize later |
| Schema evolution | Low | Document event versioning strategy early |
---
## Success Criteria (Big Picture)
**Phase 1 complete:** Developers can build event-sourced actors with OCC, no concurrent write bugs
**Phase 2 complete:** Developers can decouple components via local pub/sub, filter events
**Phase 3 complete:** Team can deploy distributed cluster, shards rebalance on node failure
**Phase 4 complete:** Multi-tenant SaaS can use Aether with complete isolation, events durable across cluster
---
## Next Steps
1. **Triage:** Review backlog with team, adjust priorities
2. **Create issues:** Use [`/issue-writing`](./BACKLOG.md) skill to populate Gitea
3. **Set dependencies:** Use `tea issues deps add` to link blockers
4. **Plan Phase 1:** Create sprint, assign issues, start
5. **Monitor:** Track velocity, adjust Phase 2 plan
---
## Getting Help
**Questions about this backlog?**
- Issue detail: See [`BACKLOG.md`](./BACKLOG.md)
- Quick lookup: See [`BACKLOG_QUICK_REFERENCE.md`](./BACKLOG_QUICK_REFERENCE.md)
- Domain concepts: See [`DOMAIN_MODEL_*.md`](./DOMAIN_MODEL_SUMMARY.md)
**Questions about requirements?**
- Product value: See [`CAPABILITIES.md`](./CAPABILITIES.md)
- User context: See [`PROBLEM_MAP.md`](./PROBLEM_MAP.md)
- Vision: See [`vision.md`](./vision.md)
**Questions about strategy?**
- How we got here: See [`STRATEGY_CHAIN.md`](./STRATEGY_CHAIN.md)
- Organization context: See [Flowmade Manifesto](https://git.flowmade.one/flowmade-one/architecture/src/branch/main/manifesto.md)
---
**Document Version:** 1.0
**Last Updated:** 2026-01-12
**Backlog Status:** Ready for Gitea import
**Approval Pending:** Architecture review, team estimation