- Add UpdateVersionCache method to JetStreamEventStore for cache synchronization - Add SubscribeToEventStored convenience helper to NATSEventBus - Create integration tests for cross-node broadcasting scenarios - Add example demonstrating NATSEventBus + JetStreamEventStore integration - Tests verify: - Single-node broadcasting works - Multi-node event flow works - Version cache consistency across nodes - Namespace isolation maintained - EventStored subscription works correctly
64 lines
2.5 KiB
Markdown
64 lines
2.5 KiB
Markdown
# Issue: Implement Actor Migration Between Cluster Nodes
|
|
|
|
## Problem
|
|
|
|
When nodes join or leave the cluster, actors need to be migrated to maintain even distribution. Currently:
|
|
|
|
- `handleRebalanceRequest` in `cluster/manager.go:150` is empty
|
|
- `handleMigrationRequest` in `cluster/manager.go:167` is empty
|
|
- `RebalanceShards` in `cluster/shard.go:211` returns unchanged map
|
|
- `SendMessage` in `cluster/distributed.go:139` ignores sharding
|
|
|
|
## Required Implementation
|
|
|
|
### 1. Rebalance Algorithm (cluster/shard.go)
|
|
Implement `ConsistentHashPlacement.RebalanceShards` to:
|
|
- Calculate new shard assignments based on active nodes
|
|
- Identify actors needing migration
|
|
- Generate migration plan with source/dest nodes
|
|
|
|
### 2. Migration Coordinator (cluster/manager.go)
|
|
Implement `handleRebalanceRequest` to:
|
|
- Accept migration plan from leader
|
|
- For each actor in plan:
|
|
1. Pause incoming messages
|
|
2. Capture actor state (replay events up to current version)
|
|
3. Serialize state
|
|
4. Send migration request to destination node
|
|
5. Wait for ack
|
|
6. Delete actor from current node
|
|
- Track migration status via `ActorMigration.Status`
|
|
|
|
### 3. Cross-Node Message Routing (cluster/distributed.go)
|
|
Implement proper routing in `SendMessage`:
|
|
- Use `GetActorNode(actorID)` to determine target node
|
|
- If remote: marshal message, send via NATS to target node
|
|
- If local: send to local runtime
|
|
- Route response back to caller if needed
|
|
|
|
## Suggested Approach
|
|
|
|
1. **Define message types** for actor migration requests/responses in `cluster/types.go`
|
|
2. **Implement state capture** - replay events to get current state
|
|
3. **Implement state restore** - deserialize and restore actor state
|
|
4. **Implement coordinator** - manage migration phases
|
|
5. **Add error handling** - handle failed migrations, retries, cleanup
|
|
6. **Add tests** - test migration with mock NATS
|
|
|
|
## Related Files
|
|
|
|
- `cluster/manager.go:150` - handleRebalanceRequest (empty)
|
|
- `cluster/manager.go:167` - handleMigrationRequest (empty)
|
|
- `cluster/shard.go:211` - RebalanceShards (stub)
|
|
- `cluster/distributed.go:139` - SendMessage (simplified)
|
|
- `cluster/types.go:108` - ActorMigration struct
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `RebalanceShards` returns new shard map with actor assignments
|
|
- [ ] `handleRebalanceRequest` processes migration plan
|
|
- [ ] `handleMigrationRequest` accepts actor migrations
|
|
- [ ] `SendMessage` routes to correct node
|
|
- [ ] Actors can be migrated with state preserved
|
|
- [ ] Failed migrations are handled gracefully
|
|
- [ ] Integration test with multi-node cluster |