aether/.product-strategy/BACKLOG_INDEX.md at 271f5db4449c1220c5e5013822fcf635896af243

flowmade-one/aether

Fork 0

Files

Hugo Nijhuis 271f5db444

CI / build (push) Successful in 21s

Details

CI / integration (push) Failing after 2m1s

Details

Move product strategy documentation to .product-strategy directory

Organize all product strategy and domain modeling documentation into a
dedicated .product-strategy directory for better separation from code.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-12 23:57:20 +01:00

15 KiB

Raw Blame History

Date Generated: 2026-01-12 Total Issues: 67 Total Capabilities: 9 Total Bounded Contexts: 5 Total Phases: 4

Quick Start

For busy decision-makers:

Read: BACKLOG_QUICK_REFERENCE.md (5 min)
See: Critical path shows 13 P0 issues for MVP
Plan: 4 phases, ~10 weeks for full scope, ~6 weeks for critical path

For engineers:

Read: BACKLOG.md (comprehensive, 2600+ lines)
Pick: Phase 1 issues (foundation, no dependencies)
Check: Issue details for acceptance criteria, test cases, DDD guidance

For architects:

Review: CAPABILITIES.md - Product capabilities mapped to domain
Read: BOUNDED_CONTEXT_MAP.md - Context boundaries and isolation
Study: DOMAIN_MODEL_*.md - Domain concepts and invariants

Document Map

Backlog Documents (Start Here)

Document	Purpose	Audience	Length
BACKLOG.md	Complete executable backlog with all 67 issues	Engineers, PMs	2600 lines
BACKLOG_QUICK_REFERENCE.md	Tables, dependency graph, metrics	Quick lookup	300 lines
BACKLOG_INDEX.md	This file - navigation guide	Everyone	400 lines

Domain & Strategy (Background)

Document	Purpose	Audience	When to Read
`CAPABILITIES.md`	9 capabilities mapped to domain, value, success conditions	Architects, PMs	Before implementing
`BOUNDED_CONTEXT_MAP.md`	5 contexts: isolation rules, language, lifecycle	Architects, Seniors	During design review
`STRATEGY_CHAIN.md`	Manifesto → Vision → Problem Space → Domains → Capabilities	Decision-makers	To understand "why"
`PROBLEM_MAP.md`	Event storming: user journeys, decisions, events	Product, Architects	Before Phase 1

Domain Models (Technical Reference)

Document	Purpose	Scope
`DOMAIN_MODEL_SUMMARY.md`	1-page overview of all domain models	All 5 contexts
`DOMAIN_MODEL_EVENT_SOURCING.md`	Event Sourcing context (aggregates, commands, events, invariants)	Deep dive: Context 1
`DOMAIN_MODEL_OCC.md`	Optimistic Concurrency context	Deep dive: Context 2
`DOMAIN_MODEL_NAMESPACE_ISOLATION.md`	Namespace Isolation context	Deep dive: Context 4
`BOUNDED_CONTEXT_MAP.md`	Event Bus + Cluster coordination contexts (Contexts 3, 5)	Integrated view

Cluster Documentation

Document	Purpose	When to Read
`cluster/DOMAIN_MODEL.md`	Cluster coordination domain model (aggregates, commands, events)	Phase 3
`cluster/ARCHITECTURE.md`	Cluster architecture (leader election, shards, failure recovery)	Phase 3 planning
`cluster/PATTERNS.md`	Distributed patterns used in cluster coordination	Phase 3 implementation

How to Use This Backlog

Scenario 1: "I need to build this. Where do I start?"

Read BACKLOG_QUICK_REFERENCE.md (5 min)
Focus on Phase 1 (17 issues, foundation)
Start with Issue 1.1 (SaveEvent)
Dependencies show what unblocks what

Go to: BACKLOG.md for full details

Scenario 2: "I need to understand the domain before coding"

Read CAPABILITIES.md (product value perspective)
Read PROBLEM_MAP.md (user journeys and events)
Read DOMAIN_MODEL_SUMMARY.md (1-page overview)
Deep-dive into specific context models (DOMAIN_MODEL_*.md)

Go to: DOMAIN_MODEL_EVENT_SOURCING.md for Phase 1 focus

Scenario 3: "I'm implementing Phase 1. What do I need to know?"

Phase 1 covers: Event storage, replay, snapshots, OCC, retry patterns

Start: Issue 1.1 (SaveEvent with version validation)
- Acceptance criteria tell you exactly what to build
- DDD guidance explains the invariant (monotonic versions)
- Test cases show edge cases
Then: Issues 1.2-1.5 (append-only, events, queries)
- These depend on 1.1; implement in parallel where possible
Learn: Read DOMAIN_MODEL_EVENT_SOURCING.md
- Understand aggregates, commands, events, invariants
- See how SaveEvent fits into the larger picture
Check: BACKLOG.md, Issue 1.1, acceptance criteria
- Concrete, testable, specific requirements

Go to: BACKLOG.md Phase 1 section (line 48-300)

Scenario 4: "I'm planning Phase 3 (Clustering). Help me understand the domain."

Phase 3 covers: Node discovery, leader election, shard distribution, failure recovery

Background: Read cluster/DOMAIN_MODEL.md
- Aggregates: Cluster, LeadershipLease, ShardAssignment
- Commands: JoinCluster, ElectLeader, RebalanceShards
- Events: LeaderElected, NodeFailed, ShardMigrated
- Invariants: single leader, no orphaned shards
Architecture: Read cluster/ARCHITECTURE.md
- How leader election works (lease-based, NATS heartbeats)
- How consistent hashing minimizes reshuffling
- How failures trigger rebalancing
Patterns: Read cluster/PATTERNS.md
- Distributed consensus patterns
- Health check patterns
- Migration patterns
Issues: See BACKLOG.md Phase 3 (issues 3.1-3.17)
- Decomposed into: topology, leadership, shards, failure recovery
- Dependency order: discovery → election → assignment → health → rebalancing

Go to: BACKLOG.md Phase 3 section (line 800-1200)

Scenario 5: "I need to present this to stakeholders. What's the pitch?"

Key messages:

Why Aether? See vision.md
- Solves: "building distributed, event-sourced systems in Go without heavyweight frameworks"
- Principles: Primitives over frameworks, NATS-native, resource-conscious
What are we building? See CAPABILITIES.md
- 9 capabilities organized into 3 groups (event sourcing, cluster, event distribution)
- Each eliminates a pain point and enables a job
How much work? See BACKLOG_QUICK_REFERENCE.md
- 67 issues in 4 phases
- Critical path: 13 P0 issues for MVP (6 weeks aggressive)
- Full scope: all 67 issues (10 weeks typical)
Value timeline?
- After Phase 1: Event sourcing with conflict detection
- After Phase 2: Local pub/sub and filtering
- After Phase 3: Distributed cluster with automatic recovery
- After Phase 4: Multi-tenant NATS-native delivery

Slides: Reference CAPABILITIES.md success conditions, value map

Scenario 6: "I found a bug in existing code. Which issues cover this area?"

Use dependency graph in BACKLOG_QUICK_REFERENCE.md

Example: "SaveEvent isn't enforcing version validation" → Look for: Issue 1.1, 1.2, 1.4 → Read: DOMAIN_MODEL_EVENT_SOURCING.md, monotonic version invariant → Fix: Implement version check in SaveEvent

Issue Numbering Scheme

Format: {Phase}.{FeatureSet}.{Issue}

Phase: 1-4 (Event Sourcing, Event Bus, Cluster, Namespace/NATS)
FeatureSet: a-z (subgrouping within phase)
Issue: 1-N (individual work item)

Examples:

1.1 = Phase 1, Feature Set 1a (Event Storage), Issue 1
3.13 = Phase 3, Feature Set 3c (Failure Recovery), Issue 6
4.5 = Phase 4, Feature Set 4b (NATS Delivery), Issue 1

Issue Types

Each issue has a type that indicates what kind of work:

Type	Example	Time Estimate
Command	SaveEvent, Subscribe	2-5 days
Rule	Enforce append-only, fail-fast	1-3 days
Event	Publish EventStored, LeaderElected	1-2 days
Query	GetEvents, GetLeader	2-3 days
Interface	SnapshotStore contract	1 day
Validation	Namespace format checks	1 day
Documentation	Retry patterns, cluster migration	2-5 days

Priority Levels

Level	Meaning	Approach
P0	Blocking; no alternative path	Must complete before next items
P1	Important; ship without but limited value	Complete after P0
P2	Nice-to-have; polish, observability	Complete if time allows

Recommendation: Focus on P0 issues first. They're blocking; P1 issues may be parallelizable.

Issue Status Tracking

Not yet in Gitea. Use this backlog to:

Create issues with /issue-writing skill
Set up dependencies in Gitea (tea issues deps add)
Track progress per phase
Measure velocity (issues/week)

Suggested milestone structure:

Milestone 1: Phase 1 (Event Sourcing Foundation)
Milestone 2: Phase 2 (Local Event Bus)
Milestone 3: Phase 3 (Cluster Coordination)
Milestone 4: Phase 4 (Namespace & NATS)

Context at a Glance

Context 1: Event Sourcing

Issues: 1.1-1.10 (foundational)
Key Invariant: Monotonic versions per actor
Key Command: SaveEvent(event)
Key Query: GetLatestVersion, GetEvents
What it enables: Immutable history, replay, OCC

Context 2: Optimistic Concurrency Control

Issues: 1.11-1.12
Key Invariant: Conflicts detected immediately
Key Command: AttemptWrite (via SaveEvent)
Key Error: VersionConflictError with context
What it enables: Multi-writer safety without locks

Context 3: Event Bus (Local)

Issues: 2.1-2.9
Key Invariant: Exact subscriptions isolated; non-blocking delivery
Key Commands: Publish, Subscribe
Key Queries: GetSubscriptions, metrics
What it enables: Local pub/sub, loose coupling

Context 4: Namespace Isolation

Issues: 4.1-4.4
Key Invariant: Events from namespace X invisible to Y
Key Mechanism: Stream prefixing ("tenant-a_events")
What it enables: Multi-tenancy, logical boundaries

Context 5: Cluster Coordination

Issues: 3.1-3.17
Key Invariants: Single leader, no orphaned shards, no lost actors
Key Commands: JoinCluster, ElectLeader, RebalanceShards
Key Queries: GetLeader, GetShardAssignments
What it enables: Distributed deployment, HA, auto-recovery

Context 6: Event Bus (NATS)

Issues: 4.5-4.8
Key Invariant: Exactly-once cross-node delivery
Key Mechanism: NATS subjects, JetStream consumers
What it enables: Cross-node pub/sub, durability

Dependency Rules

Golden rule: Never implement an issue until its dependencies are complete.

Check dependencies:

See issue detail in BACKLOG.md
Look at "Dependencies" section
Verify blockers are done
Update status as you progress

Example: To implement Issue 3.13 (RebalanceShards):

✓ Must have: Issue 3.8 (Consistent hashing)
✓ Must have: Issue 3.12 (Health checks)
Then: Can implement 3.13
Then: Can implement 3.14-3.17 (validation, events)

Estimating Work

This backlog does NOT include time estimates (hours/days). Reasoning:

Estimates are team-specific (experienced Go team vs. first-time)
Estimates can bias priority (easier wins first, not highest value)
Better to track velocity (issues/week) after a few sprints

For planning, use:

Story point ballpark: 2 (small work), 3 (medium), 5 (complex), 8 (very complex)
Typical issue: 2-5 story points
Range for Phase 1: 30-50 points
Range for full backlog: 150-250 points

Adjust based on team experience with:

Distributed systems
Go (language learning curve minimal)
Event sourcing (paradigm shift; budget time for learning)
NATS (simple; learning curve 1-2 weeks)

Recommended Team Structure

Minimum viable team:

1 senior architect (domain design, tricky decisions)
2 engineers (implementation, tests)
1 DevOps/infra (NATS setup, integration tests)

Ideal team:

1 tech lead (architecture, guidance, code review)
3-4 engineers (parallel implementation)
1 QA (integration tests, failure scenarios)
1 DevOps (NATS, cluster setup, monitoring)

Phase-by-phase staffing:

Phase 1: 2 engineers (sequential, learning curve)
Phase 2: 2-3 engineers (parallelizable)
Phase 3: 3-4 engineers (complex, needs multi-node testing)
Phase 4: 2 engineers (NATS integration, can overlap Phase 3)

Risks & Mitigation

Risk	Impact	Mitigation
Distributed systems unfamiliar	High	Spike on patterns, pair programming
Event sourcing complexity	High	Start with simple aggregates, read DOMAIN_MODEL_EVENT_SOURCING.md
NATS learning curve	Medium	Team pair with NATS expert, use existing integrations
Multi-node testing	Medium	Use Docker Compose for local cluster, integration tests first
Snapshot strategy	Low	Start simple (no snapshots), optimize later
Schema evolution	Low	Document event versioning strategy early

Success Criteria (Big Picture)

Phase 1 complete: Developers can build event-sourced actors with OCC, no concurrent write bugs

Phase 2 complete: Developers can decouple components via local pub/sub, filter events

Phase 3 complete: Team can deploy distributed cluster, shards rebalance on node failure

Phase 4 complete: Multi-tenant SaaS can use Aether with complete isolation, events durable across cluster

Next Steps

Triage: Review backlog with team, adjust priorities
Create issues: Use /issue-writing skill to populate Gitea
Set dependencies: Use tea issues deps add to link blockers
Plan Phase 1: Create sprint, assign issues, start
Monitor: Track velocity, adjust Phase 2 plan

Getting Help

Questions about this backlog?

Issue detail: See BACKLOG.md
Quick lookup: See BACKLOG_QUICK_REFERENCE.md
Domain concepts: See DOMAIN_MODEL_*.md

Questions about requirements?

Product value: See CAPABILITIES.md
User context: See PROBLEM_MAP.md
Vision: See vision.md

Questions about strategy?

How we got here: See STRATEGY_CHAIN.md
Organization context: See Flowmade Manifesto

Document Version: 1.0 Last Updated: 2026-01-12 Backlog Status: Ready for Gitea import Approval Pending: Architecture review, team estimation

15 KiB Raw Blame History

Aether Executable Backlog: Index & Navigation