Implement actor migration between cluster nodes #146

New Issue

HugoNijhuis · 2026-05-15T09:35:25Z

HugoNijhuis commented

2026-05-15 09:35:25 +00:00

Problem

When nodes join or leave the cluster, actors need to be migrated to maintain even distribution. Currently:

handleRebalanceRequest in cluster/manager.go:150 is empty
handleMigrationRequest in cluster/manager.go:167 is empty
RebalanceShards in cluster/shard.go:211 returns unchanged map
SendMessage in cluster/distributed.go:139 ignores sharding

Required Implementation

1. Rebalance Algorithm (cluster/shard.go)

Implement ConsistentHashPlacement.RebalanceShards to:

Calculate new shard assignments based on active nodes
Identify actors needing migration
Generate migration plan with source/dest nodes

2. Migration Coordinator (cluster/manager.go)

Implement handleRebalanceRequest to:

Accept migration plan from leader
For each actor in plan:
1. Pause incoming messages
2. Capture actor state (replay events up to current version)
3. Serialize state
4. Send migration request to destination node
5. Wait for ack
6. Delete actor from current node
Track migration status via ActorMigration.Status

3. Cross-Node Message Routing (cluster/distributed.go)

Implement proper routing in SendMessage:

Use GetActorNode(actorID) to determine target node
If remote: marshal message, send via NATS to target node
If local: send to local runtime
Route response back to caller if needed

Suggested Approach

Define message types for actor migration requests/responses in cluster/types.go
Implement state capture - replay events to get current state
Implement state restore - deserialize and restore actor state
Implement coordinator - manage migration phases
Add error handling - handle failed migrations, retries, cleanup
Add tests - test migration with mock NATS

cluster/manager.go:150 - handleRebalanceRequest (empty)
cluster/manager.go:167 - handleMigrationRequest (empty)
cluster/shard.go:211 - RebalanceShards (stub)
cluster/distributed.go:139 - SendMessage (simplified)
cluster/types.go:108 - ActorMigration struct

Acceptance Criteria

RebalanceShards returns new shard map with actor assignments
handleRebalanceRequest processes migration plan
handleMigrationRequest accepts actor migrations
SendMessage routes to correct node
Actors can be migrated with state preserved
Failed migrations are handled gracefully
Integration test with multi-node cluster

## Problem When nodes join or leave the cluster, actors need to be migrated to maintain even distribution. Currently: - `handleRebalanceRequest` in `cluster/manager.go:150` is empty - `handleMigrationRequest` in `cluster/manager.go:167` is empty - `RebalanceShards` in `cluster/shard.go:211` returns unchanged map - `SendMessage` in `cluster/distributed.go:139` ignores sharding ## Required Implementation ### 1. Rebalance Algorithm (cluster/shard.go) Implement `ConsistentHashPlacement.RebalanceShards` to: - Calculate new shard assignments based on active nodes - Identify actors needing migration - Generate migration plan with source/dest nodes ### 2. Migration Coordinator (cluster/manager.go) Implement `handleRebalanceRequest` to: - Accept migration plan from leader - For each actor in plan: 1. Pause incoming messages 2. Capture actor state (replay events up to current version) 3. Serialize state 4. Send migration request to destination node 5. Wait for ack 6. Delete actor from current node - Track migration status via `ActorMigration.Status` ### 3. Cross-Node Message Routing (cluster/distributed.go) Implement proper routing in `SendMessage`: - Use `GetActorNode(actorID)` to determine target node - If remote: marshal message, send via NATS to target node - If local: send to local runtime - Route response back to caller if needed ## Suggested Approach 1. **Define message types** for actor migration requests/responses in `cluster/types.go` 2. **Implement state capture** - replay events to get current state 3. **Implement state restore** - deserialize and restore actor state 4. **Implement coordinator** - manage migration phases 5. **Add error handling** - handle failed migrations, retries, cleanup 6. **Add tests** - test migration with mock NATS ## Related Files - `cluster/manager.go:150` - handleRebalanceRequest (empty) - `cluster/manager.go:167` - handleMigrationRequest (empty) - `cluster/shard.go:211` - RebalanceShards (stub) - `cluster/distributed.go:139` - SendMessage (simplified) - `cluster/types.go:108` - ActorMigration struct ## Acceptance Criteria - [ ] `RebalanceShards` returns new shard map with actor assignments - [ ] `handleRebalanceRequest` processes migration plan - [ ] `handleMigrationRequest` accepts actor migrations - [ ] `SendMessage` routes to correct node - [ ] Actors can be migrated with state preserved - [ ] Failed migrations are handled gracefully - [ ] Integration test with multi-node cluster

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: flowmade-one/aether#146