Move product strategy documentation to .product-strategy directory

Organize all product strategy and domain modeling documentation into a dedicated .product-strategy directory for better separation from code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-12 23:57:11 +01:00
parent 18ea677585
commit 271f5db444
26 changed files with 16521 additions and 0 deletions
--- a/.product-strategy/BOUNDED_CONTEXT_MAP.md
+++ b/.product-strategy/BOUNDED_CONTEXT_MAP.md
@@ -0,0 +1,751 @@
+# Bounded Context Map: Aether Distributed Actor System
+
+## Summary
+
+Aether has **five distinct bounded contexts** cut by language boundaries, lifecycle differences, ownership patterns, and scaling needs. The contexts emerge from the problem space: single-node event sourcing, distributed clustering, logical isolation, optimistic concurrency control, and event distribution.
+
+**Key insight:** Each context has its own ubiquitous language (different meanings for similar terms) and its own lifecycle (actors persist forever; leases expire; subscriptions have independent lifetimes). Boundaries are enforced by language/data ownership, not by organizational structure.
+
+---
+
+## Bounded Contexts
+
+### Context 1: Event Sourcing
+
+**Purpose:** Persist events as immutable source of truth; enable state rebuild through replay.
+
+**Core Responsibility:**
+- Events are facts (immutable, append-only)
+- Versions are monotonically increasing per actor
+- Snapshots are optional optimization hints, not required
+- Replay reconstructs state from history
+
+**Language (Ubiquitous Language):**
+- **Event**: Immutable fact about what happened; identified by ID, type, actor, version
+- **Version**: Monotonically increasing sequence number per actor; used for optimistic locking
+- **Snapshot**: Point-in-time state capture at a specific version; optional; can always replay
+- **ActorID**: Identifier for the entity whose events we're storing; unique within namespace
+- **Replay**: Process of reading events from start version, applying each, to rebuild state
+
+**Key Entities (Event-Based, not Object-Based):**
+- Event (immutable, versioned)
+- ActorSnapshot (optional state cache)
+- EventStore interface (multiple implementations)
+
+**Key Events Published:**
+- `EventStored` - Event successfully persisted (triggered when SaveEvent succeeds)
+- `VersionConflict` - Attempted version <= current; optimistic lock lost (expensive mistake)
+- `SnapshotCreated` - State snapshot saved (optional; developers decide when)
+
+**Key Events Consumed:**
+- None (this context is a source of truth; others consume from it)
+
+**Interfaces to Other Contexts:**
+- **Cluster Coordination**: Cluster leader queries latest versions to assign shards
+- **Namespace Isolation**: Stores can be namespaced; queries filtered by namespace
+- **Optimistic Concurrency**: Version conflicts trigger retry logic in application
+- **Event Bus**: Events stored here are published to bus subscribers
+
+**Lifecycle:**
+- Event creation: Triggered by application business logic (domain events)
+- Event persistence: Synchronous SaveEvent call (writes to store)
+- Event durability: Persists forever (or until retention policy expires in JetStream)
+- Snapshot lifecycle: Optional; created by application decision or rebalancing; can be safely discarded (replay recovers)
+
+**Owner:** Developer (application layer) owns writing events; Aether library owns storage
+
+**Current Code Locations:**
+- `/aether/event.go` - Event struct, VersionConflictError, ReplayError
+- `/aether/store/memory.go` - InMemoryEventStore implementation
+- `/aether/store/jetstream.go` - JetStreamEventStore implementation (production)
+
+**Scaling Concerns:**
+- Single node: Full replay fast for actors with <100 events; snapshots help >100 events
+- Cluster: Events stored in JetStream (durable across nodes); replay happens on failover
+- Multi-tenant: Events namespaced; separate streams per namespace avoid cross-contamination
+
+**Alignment with Vision:**
+- **Primitives over Frameworks**: EventStore is interface; multiple implementations
+- **NATS-Native**: JetStreamEventStore uses JetStream durability
+- **Events as Complete History**: Events are source of truth; state is derived
+
+**Gaps/Observations:**
+- Snapshot strategy is entirely application's responsibility (no built-in triggering)
+- Schema evolution for events not discussed (backward compatibility on deserialization)
+- Corruption recovery (ReplayError handling) is application's responsibility
+
+**Boundary Rules:**
+- Inside: Event persistence, version validation, replay logic
+- Outside: Domain logic that generates events, retry policy on conflicts, snapshot triggering
+- Cannot cross: No shared models between Event Sourcing and other contexts; translation happens via events
+
+---
+
+### Context 2: Optimistic Concurrency Control
+
+**Purpose:** Detect and signal concurrent write conflicts; let application choose retry strategy.
+
+**Core Responsibility:**
+- Protect against lost writes from concurrent writers
+- Detect conflicts early (version mismatch)
+- Provide detailed error context for retry logic
+- Enable at-least-once semantics for idempotent operations
+
+**Language (Ubiquitous Language):**
+- **Version**: Sequential number tracking writer's view of current state
+- **Conflict**: Condition where attempted version <= current version (another writer won)
+- **Optimistic Lock**: Assumption that conflicts are rare; detect when they happen
+- **Retry**: Application's response to conflict; reload state and attempt again
+- **AttemptedVersion**: Version proposed by current writer
+- **CurrentVersion**: Version that actually won the race
+
+**Key Entities:**
+- VersionConflictError (detailed error with actor ID, attempted, current versions)
+- OptimisticLock pattern (implicit; not a first-class entity)
+
+**Key Events Published:**
+- `VersionConflict` - SaveEvent rejected due to version <= current (developer retries)
+
+**Key Events Consumed:**
+- None directly; consumes version state from Event Sourcing
+
+**Interfaces to Other Contexts:**
+- **Event Sourcing**: Reads latest version; detects conflicts on save
+- **Application Logic**: Application handles conflict and decides retry strategy
+
+**Lifecycle:**
+- Conflict detection: Synchronous in SaveEvent (fast check: version > current)
+- Conflict lifecycle: Temporary; conflict happens then application retries with new version
+- Error lifecycle: Returned immediately; application decides next action
+
+**Owner:** Aether library (detects conflicts); Application (implements retry strategy)
+
+**Current Code Locations:**
+- `/aether/event.go` - ErrVersionConflict sentinel, VersionConflictError type
+- `/aether/store/jetstream.go` - SaveEvent validation (lines checking version)
+- `/aether/store/memory.go` - SaveEvent validation
+
+**Scaling Concerns:**
+- High contention: If many writers target same actor, conflicts spike; application must implement backoff
+- Retry storms: Naive retry (tight loop) causes cascade failures; exponential backoff mitigates
+- Metrics: Track conflict rate to detect unexpected contention
+
+**Alignment with Vision:**
+- **Primitives over Frameworks**: Aether returns error; application decides what to do
+- Does NOT impose retry strategy (that would be a framework opinion)
+
+**Gaps/Observations:**
+- No built-in retry mechanism (intentional design choice)
+- No conflict metrics in library (application must instrument)
+- No guidance on retry backoff strategies in code (documented in PROBLEM_MAP, not in API)
+
+**Boundary Rules:**
+- Inside: Detect conflict, validate version > current, return detailed error
+- Outside: Retry logic, backoff strategy, exponential delays, giving up after N attempts
+- Cannot cross: Each context owns its retry behavior; no global retry handler
+
+---
+
+### Context 3: Namespace Isolation
+
+**Purpose:** Provide logical data boundaries without opinionated multi-tenancy framework.
+
+**Core Responsibility:**
+- Route events to subscribers matching namespace pattern
+- Isolate event stores by namespace prefix
+- Support hierarchical namespace naming (e.g., "prod.tenant-abc", "staging.orders")
+- Warn about wildcard bypass of isolation (explicit decision)
+
+**Language (Ubiquitous Language):**
+- **Namespace**: Logical boundary (tenant, domain, environment, bounded context)
+- **Namespace Pattern**: NATS-style wildcard matching: "*" (single token), ">" (multi-token)
+- **Isolation**: Guarantee that events in namespace-A cannot be read from namespace-B (except via wildcard)
+- **Wildcard Subscription**: Cross-namespace visibility for trusted components (logging, monitoring)
+- **Subject**: NATS subject for routing (e.g., "aether.events.{namespace}")
+
+**Key Entities:**
+- Namespace (just a string; meaning is application's)
+- JetStreamConfig with Namespace field (storage isolation)
+- SubscriptionFilter with namespace pattern (matching)
+- NATSEventBus subject routing
+
+**Key Events Published:**
+- `EventPublished` - Event sent to namespace subscribers (via EventBus.Publish)
+
+**Key Events Consumed:**
+- Events from Event Sourcing, filtered by namespace pattern
+
+**Interfaces to Other Contexts:**
+- **Event Sourcing**: Stores can be namespaced (prefix in stream name)
+- **Event Bus**: Publishes to namespace; subscribers match by pattern
+- **Cluster Coordination**: Might use namespaced subscriptions to isolate tenant events
+
+**Lifecycle:**
+- Namespace definition: Application decides; typically per-tenant or per-domain
+- Namespace creation: Implicit when first store/subscription uses it (no explicit schema)
+- Namespace deletion: Not supported; namespaces persist if events exist
+- Stream lifetime: JetStream stream "namespace_events" persists until deleted
+
+**Owner:** Application layer (defines namespace boundaries); Library (enforces routing)
+
+**Current Code Locations:**
+- `/aether/eventbus.go` - EventBus exact vs wildcard subscriber routing
+- `/aether/nats_eventbus.go` - NATSEventBus subject formatting (line 89: `fmt.Sprintf("aether.events.%s", namespacePattern)`)
+- `/aether/store/jetstream.go` - JetStreamConfig.Namespace field, stream name sanitization (line 83)
+- `/aether/pattern.go` - MatchNamespacePattern, IsWildcardPattern functions
+
+**Scaling Concerns:**
+- Single namespace: All events in one stream; scales with event volume
+- Multi-namespace: Separate streams per namespace; scales horizontally (add namespaces independently)
+- Wildcard subscriptions: Cross-namespace visibility; careful with security (documented warnings)
+
+**Alignment with Vision:**
+- **Primitives over Frameworks**: Namespaces are primitives; no opinionated multi-tenancy layer
+- Non-goal: "Opinionated multi-tenancy" - this library provides isolation primitives, not tenant management
+
+**Gaps/Observations:**
+- Namespace collision: No validation that namespace names are unique (risk: "orders" used by two teams)
+- Wildcard security: Extensively documented in code (SECURITY WARNING appears multiple times); good
+- No namespace registry or allow-list (application must enforce naming conventions)
+- Sanitization of namespace names happens in JetStreamEventStore (spaces → underscores) but not documented
+
+**Boundary Rules:**
+- Inside: Namespace pattern matching, subject routing, stream prefixing
+- Outside: Defining namespace semantics (tenant, domain, environment), enforcing conventions
+- Cannot cross: Events in namespace-A published to namespace-A only (except wildcard subscribers)
+
+---
+
+### Context 4: Cluster Coordination
+
+**Purpose:** Distribute actors across cluster nodes; elect leader; rebalance on topology changes.
+
+**Core Responsibility:**
+- Discover nodes in cluster (NATS-based, no external coordinator)
+- Elect one leader using lease-based coordination
+- Distribute shards across nodes via consistent hash ring
+- Detect node failures and trigger rebalancing
+- Provide shard assignment for actor placement
+
+**Language (Ubiquitous Language):**
+- **Node**: Physical or logical computer in cluster; has ID, address, capacity, status
+- **Leader**: Single node responsible for coordination and rebalancing decisions
+- **Term**: Monotonically increasing leadership election round (prevents split-brain)
+- **Shard**: Virtual partition (1024 by default); actors hash to shards; shards assigned to nodes
+- **Consistent Hash Ring**: Algorithm mapping shards to nodes such that node failures cause minimal rebalancing
+- **Rebalancing**: Reassignment of shards when topology changes (node join/fail)
+- **ShardMap**: Current state of which shards live on which nodes
+- **Heartbeat**: Periodic signal from leader renewing its lease (proves still alive)
+- **Lease**: Time window during which leader's authority is valid (TTL-based, not quorum)
+
+**Key Entities:**
+- NodeInfo (cluster node details: ID, address, capacity, status)
+- ShardMap (shard → nodes mapping; versioned)
+- LeadershipLease (leader ID, term, expiration)
+- ActorMigration (migration record for actor during rebalancing)
+
+**Key Events Published:**
+- `NodeJoined` - New node added to cluster
+- `NodeFailed` - Node stopped responding (detected by heartbeat timeout)
+- `LeaderElected` - Leader selected (term incremented)
+- `LeadershipLost` - Leader lease expired (old leader can no longer coordinate)
+- `ShardAssigned` - Leader assigns shard to nodes
+- `ShardMigrated` - Shard moved from one node to another (during rebalancing)
+
+**Key Events Consumed:**
+- Node topology changes (new nodes, failures) → trigger rebalancing
+- Leader election results → shard assignments
+
+**Interfaces to Other Contexts:**
+- **Namespace Isolation**: Could use namespaced subscriptions for cluster-internal events
+- **Event Sourcing**: Cluster queries latest version to assign shards; failures trigger replay on new node
+- **Event Bus**: Cluster messages published to event bus; subscribers on each node act on them
+
+**Lifecycle:**
+- Cluster formation: Nodes join; first leader elected
+- Leadership duration: Until lease expires (~10 seconds in config)
+- Shard assignment: Decided by leader; persists in ShardMap
+- Node failure: Detected after heartbeat timeout (~90 seconds implied by lease config)
+- Rebalancing: Triggered by topology change; completes when ShardMap versioned and distributed
+
+**Owner:** ClusterManager (coordination); LeaderElection (election); ShardManager (placement)
+
+**Current Code Locations:**
+- `/aether/cluster/types.go` - NodeInfo, ShardMap, LeadershipLease, ActorMigration types
+- `/aether/cluster/manager.go` - ClusterManager, node discovery, rebalancing loop
+- `/aether/cluster/leader.go` - LeaderElection (lease-based using NATS KV)
+- `/aether/cluster/hashring.go` - ConsistentHashRing (shard → node mapping)
+- `/aether/cluster/shard.go` - ShardManager (actor placement, shard assignment)
+
+**Scaling Concerns:**
+- Leader election latency: 10s lease, 3s heartbeat → ~13s to detect failure (tunable)
+- Rebalancing overhead: Consistent hash minimizes movements (only affects shards from failed node)
+- Shard count: 1024 default; tune based on cluster size and actor count
+
+**Alignment with Vision:**
+- **NATS-Native**: Leader election uses NATS KV store (lease-based); cluster discovery via NATS
+- **Primitives over Frameworks**: ShardManager and LeaderElection are composable; can swap algorithms
+
+**Gaps/Observations:**
+- Rebalancing is triggered but algorithm not fully shown in code excerpt ("would rebalance across N nodes")
+- Actor migration during rebalancing: ShardManager has PlacementStrategy interface but sample migration handler not shown
+- Split-brain prevention: Lease-based (no concurrent leaders) but old leader could execute stale rebalancing
+- No explicit actor state migration during shard rebalancing (where does actor state go during move?)
+
+**Boundary Rules:**
+- Inside: Node discovery, leader election, shard assignment, rebalancing decisions
+- Outside: Actor state migration (that's Event Sourcing's replay), actual actor message delivery
+- Cannot cross: Cluster decisions are made once per cluster (not per namespace or actor)
+
+---
+
+### Context 5: Event Bus (Pub/Sub Distribution)
+
+**Purpose:** Route events from producers to subscribers; support filtering and cross-node propagation.
+
+**Core Responsibility:**
+- Local event distribution (in-process subscriptions)
+- Cross-node event distribution via NATS
+- Filter events by type and actor pattern
+- Support exact and wildcard namespace patterns
+- Non-blocking delivery (drop event if channel full, don't block publisher)
+
+**Language (Ubiquitous Language):**
+- **Publish**: Send event to namespace (synchronous, non-blocking; may drop if subscribers slow)
+- **Subscribe**: Register interest in namespace pattern (returns channel)
+- **Filter**: Criteria for event delivery (EventTypes list, ActorPattern wildcard)
+- **Wildcard Pattern**: "*" (single token), ">" (multi-token) matching
+- **Subject**: NATS subject for routing (e.g., "aether.events.{namespace}")
+- **Subscriber**: Entity receiving events from channel (has local reference to channel)
+- **Deliver**: Attempt to send event to subscriber's channel; non-blocking (may drop)
+
+**Key Entities:**
+- EventBroadcaster interface (local or NATS-backed)
+- EventBus (in-memory, local subscriptions only)
+- NATSEventBus (extends EventBus; adds NATS forwarding)
+- SubscriptionFilter (event types + actor pattern)
+- filteredSubscription (internal; tracks channel, pattern, filter)
+
+**Key Events Published:**
+- `EventPublished` - Event sent via EventBus.Publish (may be delivered to subscribers)
+
+**Key Events Consumed:**
+- Events from Event Sourcing context
+
+**Interfaces to Other Contexts:**
+- **Event Sourcing**: Reads events to publish; triggered after SaveEvent
+- **Namespace Isolation**: Uses namespace pattern for routing
+- **Cluster Coordination**: Cluster messages flow through event bus
+
+**Lifecycle:**
+- Subscription creation: Caller invokes Subscribe/SubscribeWithFilter; gets channel
+- Subscription duration: Lifetime of channel (caller controls)
+- Subscription cleanup: Unsubscribe closes channel
+- Event delivery: Synchronous Publish → deliver to all matching subscribers
+- Dropped events: Non-blocking delivery; full channel = dropped event (metrics recorded)
+
+**Owner:** Library (EventBus implementation); Callers (subscribe/unsubscribe)
+
+**Current Code Locations:**
+- `/aether/eventbus.go` - EventBus (local in-process pub/sub)
+- `/aether/nats_eventbus.go` - NATSEventBus (NATS-backed cross-node)
+- `/aether/pattern.go` - MatchNamespacePattern, SubscriptionFilter matching logic
+- Metrics tracking in both implementations
+
+**Scaling Concerns:**
+- Local bus: In-memory channels; scales with subscriber count (no network overhead)
+- NATS bus: One NATS subscription per pattern; scales with unique patterns
+- Channel buffering: 100-element buffer (configurable); full = dropped events
+- Metrics: Track published, delivered, dropped per namespace
+
+**Alignment with Vision:**
+- **Primitives over Frameworks**: EventBroadcaster is interface; swappable implementations
+- **NATS-Native**: NATSEventBus uses NATS subjects for routing
+
+**Gaps/Observations:**
+- Dropped events are silent (metrics recorded but no callback); might surprise subscribers
+- Filter matching is string-based (no compile-time safety for event types)
+- Two-level filtering: Namespace at NATS level, EventTypes/ActorPattern at application level
+- NATSEventBus creates subscription per unique pattern (could be optimized with pattern hierarchy)
+
+**Boundary Rules:**
+- Inside: Event routing, filter matching, non-blocking delivery
+- Outside: Semantics of events (that's Event Sourcing); decisions on what to do when event received
+- Cannot cross: Subscribers are responsible for their channels; publisher doesn't know who consumes
+
+---
+
+## Context Relationships
+
+### Event Sourcing ↔ Event Bus
+
+**Type:** Producer/Consumer (one-to-many)
+
+**Direction:** Event Sourcing produces events; Event Bus distributes them
+
+**Integration:**
+- Application saves event to store (SaveEvent)
+- Application publishes same event to bus (Publish)
+- Subscribers receive event from bus channel
+- Events are same object (Event struct)
+
+**Decoupling:**
+- Store and bus are independent (application coordinates)
+- Bus subscribers don't know about storage
+- Replay doesn't trigger bus publish (events already stored)
+
+**Safety:**
+- No shared transaction (save and publish are separate)
+- Risk: Event saved but publish fails (or vice versa) → bus has stale view
+- Mitigation: Application's responsibility to ensure consistency
+
+---
+
+### Event Sourcing → Optimistic Concurrency Control
+
+**Type:** Dependency (nested)
+
+**Direction:** SaveEvent validates version using Optimistic Concurrency
+
+**Integration:**
+- SaveEvent calls GetLatestVersion (read current)
+- Checks event.Version > currentVersion (optimistic lock)
+- Returns VersionConflictError if not
+
+**Decoupling:**
+- Optimistic Concurrency is not a separate context; it's logic within Event Sourcing
+- Version validation is inline in SaveEvent, not a separate call
+
+**Note:** Initially these seem like separate contexts (different language, different lifecycle). But Version is Event Sourcing's concern; Conflict is just an error condition (not a separate state machine). Optimistic locking is a **pattern**, not a **context**.
+
+---
+
+### Event Sourcing → Namespace Isolation
+
+**Type:** Containment (namespaces contain event streams)
+
+**Direction:** Namespace Isolation scopes Event Sourcing
+
+**Integration:**
+- JetStreamEventStore accepts Namespace in config
+- Actual stream name becomes "{namespace}_{streamName}"
+- GetEvents, GetLatestVersion, SaveEvent are namespace-scoped
+
+**Decoupling:**
+- Each namespace has independent version sequences
+- No cross-namespace reads in Event Sourcing context
+- EventBus.Publish specifies namespace
+
+**Safety:**
+- Complete isolation at storage level (different JetStream streams)
+- Events from namespace-A cannot appear in namespace-B queries
+- Wildcard subscriptions bypass this (documented risk)
+
+---
+
+### Cluster Coordination → Event Sourcing
+
+**Type:** Consumer (reads version state)
+
+**Direction:** Cluster queries Event Sourcing for actor state
+
+**Integration:**
+- ClusterManager might query GetLatestVersion to determine if shard can migrate
+- Nodes track which actors (shards) are assigned locally
+- On failover, new node replays events from store to rebuild state
+
+**Decoupling:**
+- Cluster doesn't manage event storage (Event Sourcing owns that)
+- Cluster doesn't decide when to snapshot
+- Cluster doesn't know about versions (Event Sourcing concept)
+
+---
+
+### Cluster Coordination → Namespace Isolation
+
+**Type:** Orthogonal (can combine, but not required)
+
+**Direction:** Cluster can use namespaced subscriptions; not required
+
+**Integration:**
+- Cluster could publish node-join events to namespaced topics (e.g., "cluster.{tenant}")
+- Different tenants can have independent clusters (each with own cluster messages)
+
+**Decoupling:**
+- Cluster doesn't care about namespace semantics
+- Namespace doesn't enforce cluster topology
+
+---
+
+### Event Bus → (All contexts)
+
+**Type:** Cross-cutting concern
+
+**Direction:** Event Bus distributes events from all contexts
+
+**Integration:**
+- Event Sourcing publishes to bus after SaveEvent
+- Cluster Coordination publishes shard assignments to bus
+- Namespace Isolation is a parameter to Publish/Subscribe
+- Subscribers receive events and can filter by type/actor
+
+**Decoupling:**
+- Bus is asynchronous (events may be lost if no subscribers)
+- Subscribers don't block publishers
+- No ordering guarantee across namespaces
+
+---
+
+## Boundary Rules Summary
+
+### By Language
+
+| Language | Context | Meaning |
+|----------|---------|---------|
+| **Event** | Event Sourcing | Immutable fact; identified by ID, type, actor, version |
+| **Version** | Event Sourcing | Monotonically increasing sequence per actor; also used for optimistic locking |
+| **Snapshot** | Event Sourcing | Optional state cache at specific version; always disposable |
+| **Node** | Cluster Coordination | Physical computer in cluster; has ID, address, capacity |
+| **Leader** | Cluster Coordination | Single node elected for coordination (not per-namespace, not per-actor) |
+| **Shard** | Cluster Coordination | Virtual partition for actor placement; 1024 by default |
+| **Namespace** | Namespace Isolation | Logical boundary (tenant, domain, context); application-defined meaning |
+| **Wildcard** | Both Event Bus & Namespace | "*" (single token) and ">" (multi-token) NATS pattern matching |
+| **Subject** | Event Bus | NATS subject for message routing |
+| **Conflict** | Optimistic Concurrency | Condition where write failed due to version being stale |
+| **Retry** | Optimistic Concurrency | Application's decision to reload and try again |
+| **Subscribe** | Event Bus | Register interest in namespace pattern; returns channel |
+| **Publish** | Event Bus | Send event to namespace subscribers; non-blocking |
+
+### By Lifecycle
+
+| Entity | Created | Destroyed | Owner | Context |
+|--------|---------|-----------|-------|---------|
+| Event | SaveEvent | Never (persists forever) | Application writes, Aether stores | Event Sourcing |
+| Version | Per-event | With event | Automatic (monotonic) | Event Sourcing |
+| Snapshot | Application decision | Application decision | Application | Event Sourcing |
+| Node | Join cluster | Explicit leave | Infrastructure | Cluster Coordination |
+| Leader | Election completes | Lease expires | Automatic (election) | Cluster Coordination |
+| Shard | Created with cluster | With cluster | ClusterManager | Cluster Coordination |
+| Namespace | First use | Never (persist) | Application | Namespace Isolation |
+| Subscription | Subscribe() call | Unsubscribe() call | Caller | Event Bus |
+| Channel | Subscribe() returns | Unsubscribe() closes | Caller | Event Bus |
+
+### By Ownership
+
+| Context | Who Decides | What They Decide |
+|---------|-------------|------------------|
+| Event Sourcing | Application (developer) | When to save events, event schema, snapshot strategy |
+| Optimistic Concurrency | Application | Retry strategy, backoff, giving up |
+| Namespace Isolation | Application | Namespace semantics (tenant, domain, env), naming convention |
+| Cluster Coordination | ClusterManager & LeaderElection | Node discovery, leader election, shard assignment |
+| Event Bus | Application | What to subscribe to, filtering criteria |
+
+### By Scaling Boundary
+
+| Context | Scales By | Limits | Tuning |
+|---------|-----------|--------|--------|
+| Event Sourcing | Event volume per actor | Replay latency grows with version count | Snapshots help |
+| Cluster Coordination | Node count | Leader election latency, rebalancing overhead | Lease TTL, heartbeat interval |
+| Namespace Isolation | Namespace count | Stream count, NATS resource usage | Separate JetStream streams |
+| Event Bus | Subscriber count | Channel buffering (100 elements) | Queue depth, metrics |
+
+---
+
+## Code vs. Intended: Alignment Analysis
+
+### Intended → Actual: Good Alignment
+
+**Context: Event Sourcing**
+- Intended: EventStore interface with multiple implementations
+- Actual: InMemoryEventStore (testing) and JetStreamEventStore (production) both exist
+- ✓ Good: Matches vision of "primitives over frameworks"
+
+**Context: Optimistic Concurrency**
+- Intended: Detect conflicts, return error, let app retry
+- Actual: SaveEvent returns VersionConflictError; no built-in retry
+- ✓ Good: Aligns with vision of primitives (app owns retry logic)
+
+**Context: Namespace Isolation**
+- Intended: Logical boundaries without opinionated multi-tenancy
+- Actual: JetStreamConfig.Namespace, EventBus namespace patterns
+- ✓ Good: Primitives provided; semantics left to app
+
+**Context: Cluster Coordination**
+- Intended: Node discovery, leader election, shard assignment
+- Actual: ClusterManager, LeaderElection, ConsistentHashRing all present
+- ✓ Good: Primitives implemented
+
+**Context: Event Bus**
+- Intended: Local and cross-node pub/sub with filtering
+- Actual: EventBus (local) and NATSEventBus (NATS) both present
+- ✓ Good: Extensible via interface
+
+### Intended → Actual: Gaps
+
+**Context: Cluster Coordination**
+- Intended: Actor migration during shard rebalancing
+- Actual: ShardManager has PlacementStrategy; ActorMigration type defined
+- Gap: Migration handler logic not shown; where does actor state transition during rebalance?
+- Impact: Cluster context is foundational but incomplete; application must implement actor handoff
+
+**Context: Event Sourcing**
+- Intended: Snapshot strategy guidance
+- Actual: SnapshotStore interface; SaveSnapshot exists; no built-in strategy
+- Gap: No adaptive snapshotting, no time-based snapshotting
+- Impact: App must choose snapshot frequency (documented in PROBLEM_MAP, not enforced)
+
+**Context: Namespace Isolation**
+- Intended: Warn about wildcard security risks
+- Actual: SECURITY WARNING in docstrings (excellent)
+- Gap: No namespace registry or allow-list to prevent collisions
+- Impact: Risk of two teams using same namespace (e.g., "orders") unintentionally
+
+**Context: Optimistic Concurrency**
+- Intended: Guide app on retry strategy
+- Actual: Returns VersionConflictError with details
+- Gap: No retry helper, no backoff library
+- Impact: Each app implements own retry (fine; primitives approach)
+
+---
+
+## Refactoring Backlog (if brownfield)
+
+### No Major Refactoring Required
+
+The code structure already aligns well with intended bounded contexts:
+- Event Sourcing lives in `/event.go` and `/store/`
+- Cluster lives in `/cluster/`
+- Event Bus lives in `/eventbus.go` and `/nats_eventbus.go`
+- Pattern matching lives in `/pattern.go`
+
+### Minor Improvements
+
+**Issue 1: Document Actor Migration During Rebalancing**
+- Current: ShardManager.AssignShard exists; ActorMigration type defined
+- Gap: No example code showing how actor state moves between nodes
+- Suggestion: Add sample migration handler in cluster package
+
+**Issue 2: Add Namespace Validation/Registry**
+- Current: Namespace is just a string; no collision detection
+- Gap: Risk of two teams using same namespace
+- Suggestion: Document naming convention (e.g., "env.team.context"); optionally add schema/enum
+
+**Issue 3: Snapshot Strategy Recipes**
+- Current: SnapshotStore interface; app responsible for strategy
+- Gap: Documentation could provide sample strategies (time-based, count-based, adaptive)
+- Suggestion: Add `/examples/snapshot_strategies.go` with reference implementations
+
+**Issue 4: Metrics for Concurrency Context**
+- Current: Version conflict detection exists; no metrics
+- Gap: Apps can't easily observe conflict rate
+- Suggestion: Add conflict metrics to EventStore (or provide hooks)
+
+---
+
+## Recommendations
+
+### For Product Strategy
+
+1. **Confirm Bounded Contexts**: Review this map with team. Are these five contexts the right cut? Missing any? Too many?
+
+2. **Define Invariants per Context**:
+   - Event Sourcing: "Version must be strictly monotonic per actor" ✓ (enforced)
+   - Cluster Coordination: "Only one leader can have valid lease at a time" ✓ (lease-based)
+   - Namespace Isolation: "Events in namespace-A cannot be queried from namespace-B context" ✓ (separate streams)
+   - Optimistic Concurrency: "Conflict detection is synchronous; resolution is async" ✓ (error returned immediately)
+   - Event Bus: "Delivery is non-blocking; events may be dropped if subscriber slow" ✓ (metrics track this)
+
+3. **Map Capabilities to Contexts**:
+   - "Store events durably" → Event Sourcing context
+   - "Detect concurrent writes" → Optimistic Concurrency context
+   - "Isolate logical domains" → Namespace Isolation context
+   - "Distribute actors across nodes" → Cluster Coordination context
+   - "Route events to subscribers" → Event Bus context
+
+4. **Test Boundaries**:
+   - Single-node: Event Sourcing + Optimistic Concurrency + Event Bus (no Cluster)
+   - Multi-node: Add Cluster Coordination (but cluster decisions don't affect other contexts)
+   - Multi-tenant: Add Namespace Isolation (orthogonal to other contexts)
+
+### For Architecture
+
+1. **Complete Cluster Context Documentation**:
+   - Show actor migration lifecycle during shard rebalancing
+   - Document when state moves (during rebalance, during failover)
+   - Provide sample ShardManager implementation
+
+2. **Add Snapshot Strategy Guidance**:
+   - Time-based: Snapshot every hour
+   - Count-based: Snapshot every 100 events
+   - Adaptive: Snapshot when replay latency exceeds threshold
+
+3. **Namespace Isolation Checklist**:
+   - Define naming convention (document in README)
+   - Add compile-time checks (optional enum for known namespaces)
+   - Test multi-tenant isolation (integration test suite)
+
+4. **Concurrency Context Testing**:
+   - Add concurrent writer tests to store tests
+   - Verify VersionConflictError details are accurate
+   - Benchmark conflict detection performance
+
+### For Docs
+
+1. **Add Context Diagram**: Show five contexts as boxes; arrows for relationships
+
+2. **Add Per-Context Glossary**: Define ubiquitous language per context (terms table above)
+
+3. **Add Lifecycle Diagrams**: Show event lifetime, node lifetime, subscription lifetime, shard lifetime
+
+4. **Security Section**: Expand wildcard subscription warnings; document trust model
+
+---
+
+## Anti-Patterns Avoided
+
+### Pattern: "One Big Event Model"
+- **Anti-pattern**: Single Event struct used everywhere with union types
+- **What we do**: Event is generic; domain language lives in EventType strings and Data map
+- **Why**: Primitives approach; library doesn't impose domain model
+
+### Pattern: "Shared Mutable State Across Contexts"
+- **Anti-pattern**: ClusterManager directly mutates EventStore data structures
+- **What we do**: Contexts communicate via events (if they need to) or via explicit queries
+- **Why**: Clean boundaries; each context owns its data
+
+### Pattern: "Automatic Retry for Optimistic Locks"
+- **Anti-pattern**: Library retries internally on version conflict
+- **What we do**: Return error to caller; caller decides retry strategy
+- **Why**: Primitives approach; retry policy is app's concern, not library's
+
+### Pattern: "Opinionated Snapshot Strategy"
+- **Anti-pattern**: "Snapshot every 100 events" hardcoded
+- **What we do**: SnapshotStore interface; app decides when to snapshot
+- **Why**: Different apps have different replay latency requirements
+
+### Pattern: "Wildcard Subscriptions by Default"
+- **Anti-pattern**: All subscriptions use ">" by default (receive everything)
+- **What we do**: Explicit namespaces; wildcard is optional and warned about
+- **Why**: Security-first; isolation is default
+
+---
+
+## Conclusion
+
+Aether's five bounded contexts are **well-aligned** with the problem space and the codebase:
+
+1. **Event Sourcing** - Store events as immutable history; enable replay
+2. **Optimistic Concurrency** - Detect conflicts; let app retry
+3. **Namespace Isolation** - Logical boundaries without opinionated multi-tenancy
+4. **Cluster Coordination** - Distribute actors, elect leader, rebalance on failure
+5. **Event Bus** - Route events from producers to subscribers
+
+Each context has:
+- Clear **language boundaries** (different terms, different meanings)
+- Clear **lifecycle boundaries** (different creation/deletion patterns)
+- Clear **ownership** (who decides what within each context)
+- Clear **scaling boundaries** (why this context must be separate)
+
+The implementation **matches the vision** of "primitives over frameworks." Library provides composition points (interfaces); applications wire them together.
+
+Next step in product strategy: **Define domain models within each context** (Step 4 of strategy chain). For now, Aether provides primitives; applications build their domain models on top.