aether/.product-strategy/cluster/ARCHITECTURE.md

# Cluster Coordination: Architecture Reference

## High-Level Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    Aether Cluster Runtime                     │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ DistributedVM (Orchestrator - not an aggregate)      │   │
│  │ ├─ Local Runtime (executes actors)                   │   │
│  │ ├─ NodeDiscovery (heartbeat → cluster awareness)    │   │
│  │ ├─ ClusterManager (Cluster aggregate root)          │   │
│  │ │  ├─ nodes: Map[ID → NodeInfo]                    │   │
│  │ │  ├─ shardMap: ShardMap (current assignments)     │   │
│  │ │  ├─ hashRing: ConsistentHashRing (util)          │   │
│  │ │  └─ election: LeaderElection                      │   │
│  │ └─ ShardManager (ShardAssignment aggregate)         │   │
│  │    ├─ shardCount: int                              │   │
│  │    ├─ shardMap: ShardMap                           │   │
│  │    └─ placement: PlacementStrategy                  │   │
│  └──────────────────────────────────────────────────────┘   │
│                          │ NATS                              │
│                          ▼                                    │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ NATS Cluster                                         │   │
│  │ ├─ Subject: aether.discovery (heartbeats)           │   │
│  │ ├─ Subject: aether.cluster.* (messages)             │   │
│  │ └─ KeyValue: aether-leader-election (lease)         │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                               │
└─────────────────────────────────────────────────────────────┘
```

---

## Aggregate Boundaries

### Aggregate 1: Cluster (Root)
Owns node topology, shard assignments, and rebalancing decisions.

```
Cluster Aggregate
├─ Entities
│  └─ Cluster (root)
│     ├─ nodes: Map[NodeID → NodeInfo]
│     ├─ shardMap: ShardMap
│     ├─ hashRing: ConsistentHashRing
│     └─ currentLeaderID: string
│
├─ Commands
│  ├─ JoinCluster(nodeInfo)
│  ├─ MarkNodeFailed(nodeID)
│  ├─ AssignShards(shardMap)
│  └─ RebalanceShards(reason)
│
├─ Events
│  ├─ NodeJoined
│  ├─ NodeFailed
│  ├─ NodeLeft
│  ├─ ShardAssigned
│  ├─ ShardMigrated
│  └─ RebalancingTriggered
│
├─ Invariants Enforced
│  ├─ I2: All active shards have owners
│  ├─ I3: Shards only on healthy nodes
│  └─ I4: Assignments stable during lease
│
└─ Code Location: ClusterManager (cluster/manager.go)
```

### Aggregate 2: LeadershipLease (Root)
Owns leadership claim and ensures single leader per term.

```
LeadershipLease Aggregate
├─ Entities
│  └─ LeadershipLease (root)
│     ├─ leaderID: string
│     ├─ term: uint64
│     ├─ expiresAt: time.Time
│     └─ startedAt: time.Time
│
├─ Commands
│  ├─ ElectLeader(nodeID)
│  └─ RenewLeadership(nodeID)
│
├─ Events
│  ├─ LeaderElected
│  ├─ LeadershipRenewed
│  └─ LeadershipLost
│
├─ Invariants Enforced
│  ├─ I1: Single leader per term
│  └─ I5: Leader is active node
│
└─ Code Location: LeaderElection (cluster/leader.go)
```

### Aggregate 3: ShardAssignment (Root)
Owns shard-to-node mappings and validates assignments.

```
ShardAssignment Aggregate
├─ Entities
│  └─ ShardAssignment (root)
│     ├─ version: uint64
│     ├─ assignments: Map[ShardID → []NodeID]
│     ├─ nodes: Map[NodeID → NodeInfo]
│     └─ updateTime: time.Time
│
├─ Commands
│  ├─ AssignShard(shardID, nodeList)
│  └─ RebalanceFromTopology(nodes)
│
├─ Events
│  ├─ ShardAssigned
│  └─ ShardMigrated
│
├─ Invariants Enforced
│  ├─ I2: All shards assigned
│  └─ I3: Only healthy nodes
│
└─ Code Location: ShardManager (cluster/shard.go)
```

---

## Command Flow Diagrams

### Scenario 1: Node Joins Cluster

```
┌─────────┐ NodeJoined   ┌──────────────────┐
│New Node │─────────────▶│ClusterManager    │
└─────────┘              │.JoinCluster()    │
                         └────────┬─────────┘
                                  │
                    ┌─────────────┼─────────────┐
                    ▼             ▼             ▼
              ┌──────────┐  ┌──────────┐  ┌──────────┐
              │Validate  │  │Update    │  │Publish   │
              │ID unique │  │topology  │  │NodeJoined│
              │Capacity>0   hashRing  │  │event     │
              └────┬─────┘  └──────────┘  └──────────┘
                   │
              ┌────▼────────────────────────┐
              │Is this node leader?         │
              │If yes: trigger rebalance    │
              └─────────────────────────────┘
                        │
            ┌───────────┴───────────┐
            ▼                       ▼
    ┌──────────────────┐   ┌──────────────────┐
    │RebalanceShards   │   │(nothing)         │
    │   command        │   │                  │
    └──────────────────┘   └──────────────────┘
            │
            ▼
    ┌──────────────────────────────────┐
    │ConsistentHashPlacement           │
    │ .RebalanceShards()               │
    │ (compute new assignments)        │
    └────────────┬─────────────────────┘
                 │
                 ▼
    ┌──────────────────────────────────┐
    │ShardManager.AssignShards()       │
    │ (validate & apply)               │
    └────────────┬─────────────────────┘
                 │
            ┌────┴──────────────────────┐
            ▼                           ▼
    ┌──────────────┐           ┌─────────────────┐
    │For each      │           │Publish          │
    │shard moved   │           │ShardMigrated    │
    │             │           │event per shard  │
    └──────────────┘           └─────────────────┘
```

### Scenario 2: Node Failure Detected

```
┌──────────────────────┐
│Heartbeat timeout     │
│(LastSeen > 90s)      │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────────────┐
│ClusterManager                │
│.MarkNodeFailed()             │
│ ├─ Mark status=Failed        │
│ ├─ Remove from hashRing      │
│ └─ Publish NodeFailed event  │
└────────────┬─────────────────┘
             │
    ┌────────▼────────────┐
    │Is this node leader? │
    └────────┬────────────┘
             │
    ┌────────┴─────────────────┐
    │ YES                      │ NO
    ▼                          ▼
┌──────────────┐   ┌──────────────────┐
│Trigger       │   │(nothing)         │
│Rebalance     │   │                  │
└──────────────┘   └──────────────────┘
    │
    └─▶ [Same as Scenario 1 from RebalanceShards]
```

### Scenario 3: Leader Election (Implicit)

```
┌─────────────────────────────────────┐
│All nodes: electionLoop runs every 2s│
└────────────┬────────────────────────┘
             │
    ┌────────▼────────────┐
    │Am I leader?         │
    └────────┬────────────┘
             │
    ┌────────┴──────────────────────────┐
    │ YES                               │ NO
    ▼                                   ▼
┌──────────────┐          ┌─────────────────────────┐
│Do nothing    │          │Should try election?     │
│(already      │          │ ├─ No leader exists     │
│leading)      │          │ ├─ Lease expired        │
└──────────────┘          │ └─ (other conditions)   │
                          └────────┬────────────────┘
                                   │
                         ┌─────────▼──────────┐
                         │try AtomicCreate    │
                         │"leader" key in KV  │
                         └────────┬───────────┘
                                  │
                    ┌─────────────┴──────────────┐
                    │ SUCCESS                    │ FAILED
                    ▼                            ▼
            ┌──────────────────┐      ┌──────────────────┐
            │Became Leader!    │      │Try claim expired │
            │Publish           │      │lease; if success,│
            │LeaderElected     │      │become leader     │
            └──────────────────┘      │Else: stay on     │
                    │                 │bench              │
                    ▼                 └──────────────────┘
            ┌──────────────────┐
            │Start lease       │
            │renewal loop      │
            │(every 3s)        │
            └──────────────────┘
```

---

## Decision Trees

### Decision 1: Is Node Healthy?

```
Query: Is Node X healthy?

┌─────────────────────────────────────┐
│Get node status from Cluster.nodes   │
└────────────┬────────────────────────┘
             │
    ┌────────▼────────────────┐
    │Check node.Status field  │
    └────────┬────────────────┘
             │
    ┌────────┴───────────────┬─────────────────────┬──────────────┐
    │                        │                     │              │
    ▼                        ▼                     ▼              ▼
┌────────┐          ┌────────────┐      ┌──────────────┐  ┌─────────┐
│Active  │          │Draining    │      │Failed        │  │Unknown  │
├────────┤          ├────────────┤      ├──────────────┤  ├─────────┤
│✓Healthy│          │⚠ Draining  │      │✗ Unhealthy   │  │✗ Error  │
│Can host│          │Should not  │      │Don't use for │  │         │
│shards  │          │get new     │      │sharding      │  │         │
└────────┘          │shards, but │      │Delete shards │  └─────────┘
                    │existing OK │      │ASAP          │
                    └────────────┘      └──────────────┘
```

### Decision 2: Should This Node Rebalance Shards?

```
Command: RebalanceShards(nodeID, reason)

┌──────────────────────────────────┐
│Is nodeID the current leader?     │
└────────┬─────────────────────────┘
         │
    ┌────┴──────────────────┐
    │ YES                   │ NO
    ▼                       ▼
┌────────────┐      ┌──────────────────────┐
│Continue    │      │REJECT: NotLeader     │
│rebalancing │      │                      │
└─────┬──────┘      │Only leader can       │
      │             │initiate rebalancing  │
      │             └──────────────────────┘
      │
      ▼
┌─────────────────────────────────────┐
│Are there active nodes?              │
└────────┬────────────────────────────┘
         │
    ┌────┴──────────────────┐
    │ YES                   │ NO
    ▼                       ▼
┌────────────┐      ┌──────────────────────┐
│Continue    │      │REJECT: NoActiveNodes │
│rebalancing │      │                      │
└─────┬──────┘      │Can't assign shards   │
      │             │with no healthy nodes │
      │             └──────────────────────┘
      │
      ▼
┌──────────────────────────────────────────┐
│Call PlacementStrategy.RebalanceShards()  │
│ (compute new assignments)                │
└────────┬─────────────────────────────────┘
         │
         ▼
┌──────────────────────────────────────────┐
│Call ShardManager.AssignShards()          │
│ (validate & apply new assignments)       │
└────────┬─────────────────────────────────┘
         │
    ┌────┴──────────────────┐
    │ SUCCESS               │ FAILURE
    ▼                       ▼
┌────────────┐      ┌──────────────────────┐
│Publish     │      │Publish               │
│Shard       │      │RebalancingFailed     │
│Migrated    │      │event                 │
│events      │      │                      │
│            │      │Log error, backoff    │
│Publish     │      │try again in 5 min    │
│Rebalancing │      │                      │
│Completed   │      │                      │
└────────────┘      └──────────────────────┘
```

### Decision 3: Can We Assign This Shard to This Node?

```
Command: AssignShard(shardID, nodeID)

┌──────────────────────────────────┐
│Is nodeID in Cluster.nodes?       │
└────────┬─────────────────────────┘
         │
    ┌────┴──────────────────┐
    │ YES                   │ NO
    ▼                       ▼
┌────────────┐      ┌──────────────────────┐
│Continue    │      │REJECT: NodeNotFound  │
│assignment  │      │                      │
└─────┬──────┘      │Can't assign shard    │
      │             │to non-existent node  │
      │             └──────────────────────┘
      │
      ▼
┌──────────────────────────────────────┐
│Check node.Status                     │
└────────┬─────────────────────────────┘
         │
    ┌────┴──────────────────┐
    │Active or Draining     │ Failed
    ▼                       ▼
┌────────────┐      ┌──────────────────────┐
│Continue    │      │REJECT: UnhealthyNode │
│assignment  │      │                      │
└─────┬──────┘      │Can't assign to       │
      │             │failed node; it can't │
      │             │execute shards        │
      │             └──────────────────────┘
      │
      ▼
┌──────────────────────────────────────┐
│Check replication factor              │
│ (existing nodes < replication limit?)│
└────────┬─────────────────────────────┘
         │
    ┌────┴──────────────────┐
    │ YES                   │ NO
    ▼                       ▼
┌────────────┐      ┌──────────────────────┐
│ACCEPT      │      │REJECT: TooManyReplicas
│Add node to │      │                      │
│shard's     │      │Already have max      │
│replica     │      │replicas for shard    │
│list        │      │                      │
└────────────┘      └──────────────────────┘
```

---

## State Transitions

### Cluster State Machine

```
                    ┌────────────────┐
                    │ Initializing   │
                    │ (no nodes)     │
                    └────────┬───────┘
                             │ NodeJoined
                             ▼
                    ┌────────────────┐
                    │ Single Node    │
                    │ (one node only)│
                    └────────┬───────┘
                             │ NodeJoined
                             ▼
    ┌────────────────────────────────────────────┐
    │ Multi-Node Cluster                         │
    │ ├─ Stable (healthy nodes, shards assigned) │
    │ ├─ Rebalancing (shards moving)             │
    │ └─ Degraded (failed node waiting for heal) │
    └────────────┬───────────────────────────────┘
                 │ (All nodes left or failed)
                 ▼
            ┌────────────────┐
            │ No Nodes       │
            │ (cluster dead) │
            └────────────────┘
```

### Node State Machine (per node)

```
    ┌────────────────┐
    │ Discovered     │
    │ (new heartbeat)│
    └────────┬───────┘
             │ JoinCluster
             ▼
    ┌────────────────┐
    │ Active         │
    │ (healthy,      │
    │  can host      │
    │  shards)       │
    └────────┬───────┘
             │
    ┌────────┴──────────────┬─────────────────┐
    │                       │                 │
    │ (graceful)            │ (heartbeat miss)│
    ▼                       ▼                 ▼
┌────────────┐      ┌────────────────┐  ┌────────────────┐
│ Draining   │      │ Failed         │  │ Failed         │
│ (stop new  │      │ (timeout:90s)  │  │ (detected)     │
│  shards,   │      │                │  │ (admin/health) │
│  preserve  │      │ Rebalance      │  │                │
│  existing) │      │ shards ASAP    │  │ Rebalance      │
│            │      │                │  │ shards ASAP    │
│            │      │Recover?        │  │                │
│            │      │  ├─ Yes:       │  │Recover?        │
│            │      │  │  → Active   │  │  ├─ Yes:       │
│            │      │  └─ No:        │  │  │  → Active   │
│            │      │     → Deleted  │  │  └─ No:        │
│            │      │                │  │     → Deleted  │
│            │      └────────────────┘  └────────────────┘
└────┬───────┘
     │ Removed
     ▼
┌────────────────┐
│ Deleted        │
│ (left cluster) │
└────────────────┘
```

### Leadership State Machine (per node)

```
    ┌──────────────────┐
    │ Not a Leader     │
    │ (waiting)        │
    └────────┬─────────┘
             │ Try Election (every 2s)
             │ Atomic create "leader" succeeds
             ▼
    ┌──────────────────┐
    │ Candidate        │
    │ (won election)   │
    └────────┬─────────┘
             │ Start lease renewal loop
             ▼
    ┌──────────────────┐
    │ Leader           │
    │ (holding lease)  │
    └────────┬─────────┘
             │
      ┌──────┴───────────┬──────────────────────┐
      │                  │                      │
      │ Renew lease      │ Lease expires        │ Graceful
      │ (every 3s)       │ (90s timeout)        │ shutdown
      │ ✓ Success        │ ✗ Failure            │
      ▼                  ▼                      ▼
    [stays]         ┌──────────────────┐   ┌──────────────────┐
                    │ Lost Leadership  │   │ Lost Leadership  │
                    │ (lease expired)  │   │ (graceful)       │
                    └────────┬─────────┘   └────────┬─────────┘
                             │                      │
                             └──────────┬───────────┘
                                        │
                                        ▼
                            ┌──────────────────┐
                            │ Not a Leader     │
                            │ (back to bench)  │
                            └──────────────────┘
                                    │
                                    └─▶ [Back to top]
```

---

## Concurrency Model

### Thread Safety

All aggregates use `sync.RWMutex` for thread safety:

```go
type ClusterManager struct {
    mutex    sync.RWMutex              // Protects access to:
    nodes    map[string]*NodeInfo      //   - nodes
    shardMap *ShardMap                 //   - shardMap
    // ...
}

// Read operation (multiple goroutines)
func (cm *ClusterManager) GetClusterTopology() map[string]*NodeInfo {
    cm.mutex.RLock()                   // Shared lock
    defer cm.mutex.RUnlock()
    // ...
}

// Write operation (exclusive)
func (cm *ClusterManager) JoinCluster(nodeInfo *NodeInfo) error {
    cm.mutex.Lock()                    // Exclusive lock
    defer cm.mutex.Unlock()
    // ... (only one writer at a time)
}
```

### Background Goroutines

```
┌─────────────────────────────────────────────────┐
│ DistributedVM.Start()                           │
├─────────────────────────────────────────────────┤
│                                                 │
│  ┌──────────────────────────────────────────┐  │
│  │ ClusterManager.Start()                   │  │
│  │ ├─ go election.Start()                   │  │
│  │ │  ├─ go electionLoop() [ticker: 2s]    │  │
│  │ │  ├─ go leaseRenewalLoop() [ticker: 3s]│  │
│  │ │  └─ go monitorLeadership() [watcher]  │  │
│  │ │                                        │  │
│  │ ├─ go monitorNodes() [ticker: 30s]      │  │
│  │ └─ go rebalanceLoop() [ticker: 5m]      │  │
│  └──────────────────────────────────────────┘  │
│                                                 │
│  ┌──────────────────────────────────────────┐  │
│  │ NodeDiscovery.Start()                    │  │
│  │ ├─ go announceNode() [ticker: 30s]       │  │
│  │ └─ Subscribe to "aether.discovery"       │  │
│  └──────────────────────────────────────────┘  │
│                                                 │
│  ┌──────────────────────────────────────────┐  │
│  │ NATS subscriptions                       │  │
│  │ ├─ "aether.cluster.*" → messages         │  │
│  │ ├─ "aether.discovery" → node updates     │  │
│  │ └─ "aether-leader-election" → KV watch   │  │
│  └──────────────────────────────────────────┘  │
│                                                 │
└─────────────────────────────────────────────────┘
```

---

## Event Sequences

### Event: Node Join, Rebalance, Shard Assignment

```
Timeline:

T=0s
  ├─ Node-4 joins cluster
  ├─ NodeDiscovery announces NodeJoined via NATS
  └─ ClusterManager receives and processes

T=0.1s
  ├─ ClusterManager.JoinCluster() executes
  ├─ Updates nodes map, hashRing
  ├─ Publishes NodeJoined event
  └─ If leader: triggers rebalancing

T=0.5s (if leader)
  ├─ ClusterManager.rebalanceLoop() fires (or triggerShardRebalancing)
  ├─ PlacementStrategy.RebalanceShards() computes new assignments
  └─ ShardManager.AssignShards() applies new assignments

T=1.0s
  ├─ Publishes ShardMigrated events (one per shard moved)
  ├─ All nodes subscribe to these events
  ├─ Each node routing table updated
  └─ Actors aware of new shard locations

T=1.5s onwards
  ├─ Actors on moved shards migrated (application layer)
  ├─ Actor Runtime subscribes to ShardMigrated
  ├─ Triggers actor migration via ActorMigration
  └─ Eventually: rebalancing complete

T=5m
  ├─ Periodic rebalance check (5m timer)
  ├─ If no changes: no-op
  └─ If imbalance detected: trigger rebalance again
```

### Event: Node Failure Detection and Recovery

```
Timeline:

T=0s
  ├─ Node-2 healthy, last heartbeat received
  └─ Node-2.LastSeen = now

T=30s
  ├─ Node-2 healthcheck runs (every 30s timer)
  ├─ Publishes heartbeat
  └─ Node-2.LastSeen updated

T=60s
  ├─ (Node-2 still healthy)
  └─ Heartbeat received, LastSeen updated

T=65s
  ├─ Node-2 CRASH (network failure or process crash)
  ├─ No more heartbeats sent
  └─ Node-2.LastSeen = 60s

T=90s (timeout)
  ├─ ClusterManager.checkNodeHealth() detects timeout
  ├─ now - LastSeen > 90s → mark node Failed
  ├─ ClusterManager.MarkNodeFailed() executes
  ├─ Publishes NodeFailed event
  ├─ If leader: triggers rebalancing
  └─ (If not leader: waits for leader to rebalance)

T=91s (if leader)
  ├─ RebalanceShards triggered
  ├─ PlacementStrategy computes new topology without Node-2
  ├─ ShardManager.AssignShards() reassigns shards
  └─ Publishes ShardMigrated events

T=92s onwards
  ├─ Actors migrated from Node-2 to healthy nodes
  └─ No actor loss (assuming replication or migration succeeded)

T=120s (Node-2 recovery)
  ├─ Node-2 process restarts
  ├─ NodeDiscovery announces NodeJoined again
  ├─ Status: Active
  └─ (Back to Node Join sequence if leader decides)
```

---

## Configuration & Tuning

### Key Parameters

```go
// From cluster/types.go
const (
    // LeaderLeaseTimeout: how long before leader must renew
    LeaderLeaseTimeout = 10 * time.Second

    // HeartbeatInterval: how often leader renews
    HeartbeatInterval = 3 * time.Second

    // ElectionTimeout: how often nodes try election
    ElectionTimeout = 2 * time.Second

    // Node failure detection (in manager.go)
    nodeFailureTimeout = 90 * time.Second
)

// From cluster/types.go
const (
    // DefaultNumShards: total shards in cluster
    DefaultNumShards = 1024

    // DefaultVirtualNodes: per-node virtual replicas for distribution
    DefaultVirtualNodes = 150
)
```

### Tuning Guide

| Parameter | Current | Rationale | Trade-off |
|-----------|---------|-----------|-----------|
| LeaderLeaseTimeout | 10s | Fast failure detection | May cause thrashing in high-latency networks |
| HeartbeatInterval | 3s | Leader alive signal every 3s | Overhead 3x per 9s window |
| ElectionTimeout | 2s | Retry elections frequently | CPU cost, but quick recovery |
| NodeFailureTimeout | 90s | 3x heartbeat interval | Tolerance for temp network issues |
| DefaultNumShards | 1024 | Good granularity for large clusters | More shards = more metadata |
| DefaultVirtualNodes | 150 | Balance between distribution and overhead | Lower = worse distribution, higher = more ring operations |

---

## Failure Scenarios & Recovery

### Scenario A: Single Node Fails

```
Before:  [A (leader), B, C, D] with 1024 shards
         ├─ A: 256 shards (+ leader)
         ├─ B: 256 shards
         ├─ C: 256 shards
         └─ D: 256 shards

B crashes (no recovery) → waits 90s → marked Failed

After Rebalance:
         [A (leader), C, D] with 1024 shards
         ├─ A: 341 shards (+ leader)
         ├─ C: 341 shards
         └─ D: 342 shards
```

**Recovery:** Reshuffled ~1/3 shards (consistent hashing + virtual nodes minimizes this)

---

### Scenario B: Leader Fails

```
Before:  [A (leader), B, C, D]

A crashes → waits 90s → marked Failed
            no lease renewal → lease expires after 10s

B, C, or D wins election → new leader
                        → triggers rebalance
                        → reshuffles A's shards

After:   [B (leader), C, D]
```

**Recovery:** New leader elected within 10s; rebalancing within 100s; no loss if replicas present

---

### Scenario C: Network Partition

```
Before:  [A (leader), B, C, D]
Partition: {A} isolated | {B, C, D} connected

At T=10s (lease expires):
  ├─ A: can't reach NATS, can't renew → loses leadership
  ├─ B, C, D: A's lease expired, one wins election
  └─ New leader coordinates rebalance

Risk: If A can reach NATS (just isolated from app), might try to renew
      but atomic update fails because term mismatch

Result: Single leader maintained; no split-brain
```

---

## Monitoring & Observability

### Key Metrics to Track

```
# Cluster Topology
gauge: cluster.nodes.count                [active|draining|failed]
gauge: cluster.shards.assigned            [0, 1024]
gauge: cluster.shards.orphaned            [0, 1024]

# Leadership
gauge: cluster.leader.is_leader           [0, 1]
gauge: cluster.leader.term                [0, ∞]
gauge: cluster.leader.lease_expires_in_seconds  [0, 10]

# Rebalancing
counter: cluster.rebalancing.triggered    [reason]
gauge: cluster.rebalancing.active         [0, 1]
counter: cluster.rebalancing.completed    [shards_moved]
counter: cluster.rebalancing.failed       [reason]

# Node Health
gauge: cluster.node.heartbeat_latency_ms  [per node]
gauge: cluster.node.load                  [per node]
gauge: cluster.node.vm_count              [per node]
counter: cluster.node.failures            [reason]
```

### Alerts

```
- Leader heartbeat missing > 5s → election may be stuck
- Rebalancing in progress > 5min → something wrong
- Orphaned shards > 0 → invariant violation
- Node failure > 50% of cluster → investigate
```

---

## References

- [DOMAIN_MODEL.md](./DOMAIN_MODEL.md) - Full domain model
- [REFACTORING_SUMMARY.md](./REFACTORING_SUMMARY.md) - Implementation roadmap
- [manager.go](./manager.go) - ClusterManager implementation
- [leader.go](./leader.go) - LeaderElection implementation
- [shard.go](./shard.go) - ShardManager implementation
- [discovery.go](./discovery.go) - NodeDiscovery implementation
- [distributed.go](./distributed.go) - DistributedVM orchestrator