Initial aether repository structure
All checks were successful
CI / build (push) Successful in 1m13s

Distributed actor system with event sourcing for Go:
- event.go - Event, ActorSnapshot, EventStore interface
- eventbus.go - EventBus, EventBroadcaster for pub/sub
- nats_eventbus.go - NATS-backed cross-node event broadcasting
- store/ - InMemoryEventStore (testing), JetStreamEventStore (production)
- cluster/ - Node discovery, leader election, shard distribution
- model/ - EventStorming model types

Extracted from arcadia as open-source infrastructure component.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-01-08 19:30:02 +01:00
commit e9e50c021f
22 changed files with 2588 additions and 0 deletions

19
.gitea/workflows/ci.yaml Normal file
View File

@@ -0,0 +1,19 @@
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.23'
- name: Build
run: go build ./...
- name: Test
run: go test ./...

20
.gitignore vendored Normal file
View File

@@ -0,0 +1,20 @@
# IDE
.idea/
.vscode/
*.swp
# OS
.DS_Store
Thumbs.db
# Build artifacts
/dist/
/build/
/bin/
# Go
/vendor/
# Test artifacts
*.test
coverage.out

116
CLAUDE.md Normal file
View File

@@ -0,0 +1,116 @@
# Aether
Distributed actor system with event sourcing for Go, powered by NATS.
## Organization Context
This repo is part of Flowmade. See:
- [Organization manifesto](../architecture/manifesto.md) - who we are, what we believe
- [Repository map](../architecture/repos.md) - how this fits in the bigger picture
- [Vision](./vision.md) - what this specific product does
## Setup
```bash
git clone git@git.flowmade.one:flowmade-one/aether.git
cd aether
go mod download
```
Requires NATS server for integration tests:
```bash
# Install NATS
brew install nats-server
# Run with JetStream enabled
nats-server -js
```
## Project Structure
```
aether/
├── event.go # Event, ActorSnapshot, EventStore interface
├── eventbus.go # EventBus, EventBroadcaster interface
├── nats_eventbus.go # NATSEventBus - cross-node event broadcasting
├── store/
│ ├── memory.go # InMemoryEventStore (testing)
│ └── jetstream.go # JetStreamEventStore (production)
├── cluster/
│ ├── manager.go # ClusterManager
│ ├── discovery.go # NodeDiscovery
│ ├── hashring.go # ConsistentHashRing
│ ├── shard.go # ShardManager
│ ├── leader.go # LeaderElection
│ └── types.go # Cluster types
└── model/
└── model.go # EventStorming model types
```
## Development
```bash
make build # Build the library
make test # Run tests
make lint # Run linters
```
## Architecture
### Event Sourcing
Events are the source of truth. State is derived by replaying events.
```go
// Create an event
event := &aether.Event{
ID: uuid.New().String(),
EventType: "OrderPlaced",
ActorID: "order-123",
Version: 1,
Data: map[string]interface{}{"total": 100.00},
Timestamp: time.Now(),
}
// Persist to event store
store.SaveEvent(event)
// Replay events to rebuild state
events, _ := store.GetEvents("order-123", 0)
```
### Namespace Isolation
Namespaces provide logical boundaries for events and subscriptions:
```go
// Subscribe to events in a namespace
ch := eventBus.Subscribe("tenant-abc")
// Events are isolated per namespace
eventBus.Publish("tenant-abc", event) // Only tenant-abc subscribers see this
```
### Clustering
Aether handles node discovery, leader election, and shard distribution:
```go
// Create cluster manager
manager := cluster.NewClusterManager(natsConn, nodeID)
// Join cluster
manager.Start()
// Leader election happens automatically
if manager.IsLeader() {
// Coordinate shard assignments
}
```
## Key Patterns
- **Events are immutable** - Never modify, only append
- **Snapshots for performance** - Periodically snapshot state to avoid full replay
- **Namespaces for isolation** - Not multi-tenancy, just logical boundaries
- **NATS for everything** - Events, pub/sub, clustering all use NATS

190
LICENSE Normal file
View File

@@ -0,0 +1,190 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to the Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
Copyright 2024-2026 Flowmade
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

13
Makefile Normal file
View File

@@ -0,0 +1,13 @@
.PHONY: build test lint clean
build:
go build ./...
test:
go test ./...
lint:
golangci-lint run
clean:
go clean

48
cluster/cluster.go Normal file
View File

@@ -0,0 +1,48 @@
// Package cluster provides distributed computing capabilities for the Aether VM runtime.
//
// This package implements a distributed actor system using NATS for coordination,
// featuring consistent hashing for shard distribution, leader election for
// coordination, and fault-tolerant actor migration between nodes.
//
// Key Components:
//
// - ConsistentHashRing: Distributes actors across cluster nodes using consistent hashing
// - LeaderElection: NATS-based leader election with lease-based coordination
// - ClusterManager: Coordinates distributed operations and shard rebalancing
// - NodeDiscovery: Manages cluster membership and node health monitoring
// - ShardManager: Handles actor placement and distribution across shards
// - DistributedVM: Main entry point for distributed VM cluster operations
//
// Usage:
//
// // Create a distributed VM node
// distributedVM, err := cluster.NewDistributedVM("node-1", []string{"nats://localhost:4222"}, localRuntime)
// if err != nil {
// log.Fatal(err)
// }
//
// // Start the cluster node
// if err := distributedVM.Start(); err != nil {
// log.Fatal(err)
// }
//
// // Load a model across the cluster
// if err := distributedVM.LoadModel(eventStormingModel); err != nil {
// log.Fatal(err)
// }
//
// Architecture:
//
// The cluster package implements a distributed actor system where each node
// runs a local VM runtime and coordinates with other nodes through NATS.
// Actors are sharded across nodes using consistent hashing, and the system
// supports dynamic rebalancing when nodes join or leave the cluster.
//
// Fault Tolerance:
//
// - Automatic node failure detection through heartbeat monitoring
// - Leader election ensures coordination continues despite node failures
// - Actor migration allows rebalancing when cluster topology changes
// - Graceful shutdown with proper resource cleanup
//
package cluster

118
cluster/discovery.go Normal file
View File

@@ -0,0 +1,118 @@
package cluster
import (
"context"
"encoding/json"
"time"
"github.com/nats-io/nats.go"
)
// NodeDiscovery manages cluster membership using NATS
type NodeDiscovery struct {
nodeID string
nodeInfo *NodeInfo
natsConn *nats.Conn
heartbeat time.Duration
timeout time.Duration
updates chan NodeUpdate
ctx context.Context
}
// NewNodeDiscovery creates a node discovery service
func NewNodeDiscovery(nodeID string, natsConn *nats.Conn, ctx context.Context) *NodeDiscovery {
nodeInfo := &NodeInfo{
ID: nodeID,
Status: NodeStatusActive,
Capacity: 1000, // Default capacity
Load: 0,
LastSeen: time.Now(),
Metadata: make(map[string]string),
}
return &NodeDiscovery{
nodeID: nodeID,
nodeInfo: nodeInfo,
natsConn: natsConn,
heartbeat: 30 * time.Second,
timeout: 90 * time.Second,
updates: make(chan NodeUpdate, 100),
ctx: ctx,
}
}
// Start begins node discovery and heartbeating
func (nd *NodeDiscovery) Start() {
// Announce this node joining
nd.announceNode(NodeJoined)
// Start heartbeat
ticker := time.NewTicker(nd.heartbeat)
defer ticker.Stop()
// Subscribe to node announcements
nd.natsConn.Subscribe("aether.discovery", func(msg *nats.Msg) {
var update NodeUpdate
if err := json.Unmarshal(msg.Data, &update); err != nil {
return
}
select {
case nd.updates <- update:
case <-nd.ctx.Done():
}
})
for {
select {
case <-ticker.C:
nd.announceNode(NodeUpdated)
case <-nd.ctx.Done():
nd.announceNode(NodeLeft)
return
}
}
}
// GetUpdates returns the channel for receiving node updates
func (nd *NodeDiscovery) GetUpdates() <-chan NodeUpdate {
return nd.updates
}
// GetNodeInfo returns the current node information
func (nd *NodeDiscovery) GetNodeInfo() *NodeInfo {
return nd.nodeInfo
}
// UpdateLoad updates the node's current load
func (nd *NodeDiscovery) UpdateLoad(load float64) {
nd.nodeInfo.Load = load
}
// UpdateVMCount updates the number of VMs on this node
func (nd *NodeDiscovery) UpdateVMCount(count int) {
nd.nodeInfo.VMCount = count
}
// announceNode publishes node status to the cluster
func (nd *NodeDiscovery) announceNode(updateType NodeUpdateType) {
nd.nodeInfo.LastSeen = time.Now()
update := NodeUpdate{
Type: updateType,
Node: nd.nodeInfo,
}
data, err := json.Marshal(update)
if err != nil {
return
}
nd.natsConn.Publish("aether.discovery", data)
}
// Stop gracefully stops the node discovery service
func (nd *NodeDiscovery) Stop() {
nd.announceNode(NodeLeft)
}

221
cluster/distributed.go Normal file
View File

@@ -0,0 +1,221 @@
package cluster
import (
"context"
"encoding/json"
"fmt"
"github.com/nats-io/nats.go"
)
// DistributedVM manages a cluster of runtime nodes with VM-per-instance architecture
type DistributedVM struct {
nodeID string
cluster *ClusterManager
localRuntime Runtime // Interface to avoid import cycles
sharding *ShardManager
discovery *NodeDiscovery
natsConn *nats.Conn
ctx context.Context
cancel context.CancelFunc
}
// Runtime interface to avoid import cycles with main aether package
type Runtime interface {
Start() error
LoadModel(model interface{}) error
SendMessage(message interface{}) error
}
// DistributedVMRegistry implements VMRegistry using DistributedVM's local runtime and sharding
type DistributedVMRegistry struct {
runtime interface{} // Runtime interface to avoid import cycles
sharding *ShardManager
}
// NewDistributedVM creates a distributed VM runtime cluster node
func NewDistributedVM(nodeID string, natsURLs []string, localRuntime Runtime) (*DistributedVM, error) {
ctx, cancel := context.WithCancel(context.Background())
// Connect to NATS cluster
natsURL := natsURLs[0] // Use first URL for simplicity
natsConn, err := nats.Connect(natsURL,
nats.Name(fmt.Sprintf("aether-runtime-%s", nodeID)))
if err != nil {
cancel()
return nil, fmt.Errorf("failed to connect to NATS: %w", err)
}
// Create cluster components
discovery := NewNodeDiscovery(nodeID, natsConn, ctx)
sharding := NewShardManager(1024, 3) // 1024 shards, 3 replicas
cluster, err := NewClusterManager(nodeID, natsConn, ctx)
if err != nil {
cancel()
natsConn.Close()
return nil, fmt.Errorf("failed to create cluster manager: %w", err)
}
dvm := &DistributedVM{
nodeID: nodeID,
cluster: cluster,
localRuntime: localRuntime,
sharding: sharding,
discovery: discovery,
natsConn: natsConn,
ctx: ctx,
cancel: cancel,
}
// Create VM registry and connect it to cluster manager
vmRegistry := &DistributedVMRegistry{
runtime: localRuntime,
sharding: sharding,
}
cluster.SetVMRegistry(vmRegistry)
return dvm, nil
}
// Start begins the distributed VM cluster node
func (dvm *DistributedVM) Start() error {
// Start local runtime
if err := dvm.localRuntime.Start(); err != nil {
return fmt.Errorf("failed to start local runtime: %w", err)
}
// Start cluster services
go dvm.discovery.Start()
go dvm.cluster.Start()
// Start message routing
go dvm.startMessageRouting()
return nil
}
// Stop gracefully shuts down the distributed VM node
func (dvm *DistributedVM) Stop() {
dvm.cancel()
dvm.cluster.Stop()
dvm.discovery.Stop()
dvm.natsConn.Close()
}
// LoadModel distributes EventStorming model across the cluster with VM templates
func (dvm *DistributedVM) LoadModel(model interface{}) error {
// Load model locally first
if err := dvm.localRuntime.LoadModel(model); err != nil {
return fmt.Errorf("failed to load model locally: %w", err)
}
// Broadcast model to other cluster nodes
msg := ClusterMessage{
Type: "load_model",
From: dvm.nodeID,
To: "broadcast",
Payload: model,
}
return dvm.publishClusterMessage(msg)
}
// SendMessage routes messages across the distributed cluster
func (dvm *DistributedVM) SendMessage(message interface{}) error {
// This is a simplified implementation
// In practice, this would determine the target node based on sharding
// and route the message appropriately
return dvm.localRuntime.SendMessage(message)
}
// GetActorNode determines which node should handle a specific actor
func (dvm *DistributedVM) GetActorNode(actorID string) string {
// Use consistent hashing to determine the target node
return dvm.cluster.hashRing.GetNode(actorID)
}
// IsLocalActor checks if an actor should be handled by this node
func (dvm *DistributedVM) IsLocalActor(actorID string) bool {
targetNode := dvm.GetActorNode(actorID)
return targetNode == dvm.nodeID
}
// GetActorsInShard returns actors that belong to a specific shard on this node
func (dvm *DistributedVM) GetActorsInShard(shardID int) []string {
return dvm.cluster.GetActorsInShard(shardID)
}
// startMessageRouting begins routing messages between cluster nodes
func (dvm *DistributedVM) startMessageRouting() {
// Subscribe to cluster messages
dvm.natsConn.Subscribe("aether.distributed.*", dvm.handleClusterMessage)
}
// handleClusterMessage processes incoming cluster coordination messages
func (dvm *DistributedVM) handleClusterMessage(msg *nats.Msg) {
var clusterMsg ClusterMessage
if err := json.Unmarshal(msg.Data, &clusterMsg); err != nil {
return
}
switch clusterMsg.Type {
case "load_model":
// Handle model loading from other nodes
if model := clusterMsg.Payload; model != nil {
dvm.localRuntime.LoadModel(model)
}
case "route_message":
// Handle message routing from other nodes
if message := clusterMsg.Payload; message != nil {
dvm.localRuntime.SendMessage(message)
}
case "rebalance":
// Handle shard rebalancing requests
dvm.handleRebalanceRequest(clusterMsg)
}
}
// handleRebalanceRequest processes shard rebalancing requests
func (dvm *DistributedVM) handleRebalanceRequest(msg ClusterMessage) {
// Simplified rebalancing logic
// In practice, this would implement complex actor migration
}
// publishClusterMessage sends a message to other cluster nodes
func (dvm *DistributedVM) publishClusterMessage(msg ClusterMessage) error {
data, err := json.Marshal(msg)
if err != nil {
return err
}
subject := fmt.Sprintf("aether.distributed.%s", msg.Type)
return dvm.natsConn.Publish(subject, data)
}
// GetClusterInfo returns information about the cluster state
func (dvm *DistributedVM) GetClusterInfo() map[string]interface{} {
nodes := dvm.cluster.GetNodes()
return map[string]interface{}{
"nodeId": dvm.nodeID,
"isLeader": dvm.cluster.IsLeader(),
"leader": dvm.cluster.GetLeader(),
"nodeCount": len(nodes),
"nodes": nodes,
}
}
// GetActiveVMs returns a map of active VMs (implementation depends on runtime)
func (dvr *DistributedVMRegistry) GetActiveVMs() map[string]interface{} {
// This would need to access the actual runtime's VM registry
// For now, return empty map to avoid import cycles
return make(map[string]interface{})
}
// GetShard returns the shard number for the given actor ID
func (dvr *DistributedVMRegistry) GetShard(actorID string) int {
return dvr.sharding.GetShard(actorID)
}

105
cluster/hashring.go Normal file
View File

@@ -0,0 +1,105 @@
package cluster
import (
"crypto/sha256"
"encoding/binary"
"fmt"
"sort"
)
// ConsistentHashRing implements a consistent hash ring for shard distribution
type ConsistentHashRing struct {
ring map[uint32]string // hash -> node ID
sortedHashes []uint32 // sorted hash keys
nodes map[string]bool // active nodes
}
// NewConsistentHashRing creates a new consistent hash ring
func NewConsistentHashRing() *ConsistentHashRing {
return &ConsistentHashRing{
ring: make(map[uint32]string),
nodes: make(map[string]bool),
}
}
// AddNode adds a node to the hash ring
func (chr *ConsistentHashRing) AddNode(nodeID string) {
if chr.nodes[nodeID] {
return // Node already exists
}
chr.nodes[nodeID] = true
// Add virtual nodes for better distribution
for i := 0; i < VirtualNodes; i++ {
virtualKey := fmt.Sprintf("%s:%d", nodeID, i)
hash := chr.hash(virtualKey)
chr.ring[hash] = nodeID
chr.sortedHashes = append(chr.sortedHashes, hash)
}
sort.Slice(chr.sortedHashes, func(i, j int) bool {
return chr.sortedHashes[i] < chr.sortedHashes[j]
})
}
// RemoveNode removes a node from the hash ring
func (chr *ConsistentHashRing) RemoveNode(nodeID string) {
if !chr.nodes[nodeID] {
return // Node doesn't exist
}
delete(chr.nodes, nodeID)
// Remove all virtual nodes for this physical node
newHashes := make([]uint32, 0)
for _, hash := range chr.sortedHashes {
if chr.ring[hash] != nodeID {
newHashes = append(newHashes, hash)
} else {
delete(chr.ring, hash)
}
}
chr.sortedHashes = newHashes
}
// GetNode returns the node responsible for a given key
func (chr *ConsistentHashRing) GetNode(key string) string {
if len(chr.sortedHashes) == 0 {
return ""
}
hash := chr.hash(key)
// Find the first node with hash >= key hash (clockwise)
idx := sort.Search(len(chr.sortedHashes), func(i int) bool {
return chr.sortedHashes[i] >= hash
})
// Wrap around to the first node if we've gone past the end
if idx == len(chr.sortedHashes) {
idx = 0
}
return chr.ring[chr.sortedHashes[idx]]
}
// hash computes a hash for the given key
func (chr *ConsistentHashRing) hash(key string) uint32 {
h := sha256.Sum256([]byte(key))
return binary.BigEndian.Uint32(h[:4])
}
// GetNodes returns all active nodes in the ring
func (chr *ConsistentHashRing) GetNodes() []string {
nodes := make([]string, 0, len(chr.nodes))
for nodeID := range chr.nodes {
nodes = append(nodes, nodeID)
}
return nodes
}
// IsEmpty returns true if the ring has no nodes
func (chr *ConsistentHashRing) IsEmpty() bool {
return len(chr.nodes) == 0
}

414
cluster/leader.go Normal file
View File

@@ -0,0 +1,414 @@
package cluster
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
"sync"
"time"
"github.com/nats-io/nats.go"
)
// LeaderElection manages NATS-based leader election using lease-based coordination
type LeaderElection struct {
nodeID string
natsConn *nats.Conn
js nats.JetStreamContext
kv nats.KeyValue
isLeader bool
currentLeader string
leaderTerm uint64
ctx context.Context
cancel context.CancelFunc
mutex sync.RWMutex
logger *log.Logger
callbacks LeaderElectionCallbacks
}
// NewLeaderElection creates a new NATS-based leader election system
func NewLeaderElection(nodeID string, natsConn *nats.Conn, callbacks LeaderElectionCallbacks) (*LeaderElection, error) {
ctx, cancel := context.WithCancel(context.Background())
// Create JetStream context
js, err := natsConn.JetStream()
if err != nil {
cancel()
return nil, fmt.Errorf("failed to create JetStream context: %w", err)
}
// Create or get KV store for leader election
kv, err := js.CreateKeyValue(&nats.KeyValueConfig{
Bucket: "aether-leader-election",
Description: "Aether cluster leader election coordination",
TTL: LeaderLeaseTimeout * 2, // Auto-cleanup expired leases
MaxBytes: 1024 * 1024, // 1MB max
Replicas: 1, // Single replica for simplicity
})
if err != nil {
// Try to get existing KV store
kv, err = js.KeyValue("aether-leader-election")
if err != nil {
cancel()
return nil, fmt.Errorf("failed to create/get KV store: %w", err)
}
}
return &LeaderElection{
nodeID: nodeID,
natsConn: natsConn,
js: js,
kv: kv,
ctx: ctx,
cancel: cancel,
logger: log.New(os.Stdout, fmt.Sprintf("[Leader %s] ", nodeID), log.LstdFlags),
callbacks: callbacks,
}, nil
}
// Start begins the leader election process
func (le *LeaderElection) Start() {
le.logger.Printf("🗳️ Starting leader election")
// Start election loop in background
go le.electionLoop()
// Start lease renewal loop in background
go le.leaseRenewalLoop()
// Start leader monitoring
go le.monitorLeadership()
}
// Stop stops the leader election process
func (le *LeaderElection) Stop() {
le.logger.Printf("🛑 Stopping leader election")
le.cancel()
// If we're the leader, resign gracefully
if le.IsLeader() {
le.resignLeadership()
}
}
// IsLeader returns whether this node is currently the leader
func (le *LeaderElection) IsLeader() bool {
le.mutex.RLock()
defer le.mutex.RUnlock()
return le.isLeader
}
// GetLeader returns the current leader ID
func (le *LeaderElection) GetLeader() string {
le.mutex.RLock()
defer le.mutex.RUnlock()
return le.currentLeader
}
// GetTerm returns the current leadership term
func (le *LeaderElection) GetTerm() uint64 {
le.mutex.RLock()
defer le.mutex.RUnlock()
return le.leaderTerm
}
// electionLoop runs the main election process
func (le *LeaderElection) electionLoop() {
ticker := time.NewTicker(ElectionTimeout)
defer ticker.Stop()
// Try to become leader immediately
le.tryBecomeLeader()
for {
select {
case <-le.ctx.Done():
return
case <-ticker.C:
// Periodically check if we should try to become leader
if !le.IsLeader() && le.shouldTryElection() {
le.tryBecomeLeader()
}
}
}
}
// leaseRenewalLoop renews the leadership lease if we're the leader
func (le *LeaderElection) leaseRenewalLoop() {
ticker := time.NewTicker(HeartbeatInterval)
defer ticker.Stop()
for {
select {
case <-le.ctx.Done():
return
case <-ticker.C:
if le.IsLeader() {
if err := le.renewLease(); err != nil {
le.logger.Printf("❌ Failed to renew leadership lease: %v", err)
le.loseLeadership()
}
}
}
}
}
// monitorLeadership watches for leadership changes
func (le *LeaderElection) monitorLeadership() {
watcher, err := le.kv.Watch("leader")
if err != nil {
le.logger.Printf("❌ Failed to watch leadership: %v", err)
return
}
defer watcher.Stop()
for {
select {
case <-le.ctx.Done():
return
case entry := <-watcher.Updates():
if entry == nil {
continue
}
le.handleLeadershipUpdate(entry)
}
}
}
// tryBecomeLeader attempts to acquire leadership
func (le *LeaderElection) tryBecomeLeader() {
le.logger.Printf("🗳️ Attempting to become leader")
now := time.Now()
newLease := LeadershipLease{
LeaderID: le.nodeID,
Term: le.leaderTerm + 1,
ExpiresAt: now.Add(LeaderLeaseTimeout),
StartedAt: now,
}
leaseData, err := json.Marshal(newLease)
if err != nil {
le.logger.Printf("❌ Failed to marshal lease: %v", err)
return
}
// Try to create the leader key (atomic operation)
_, err = le.kv.Create("leader", leaseData)
if err != nil {
// Leader key exists, check if it's expired
if le.tryClaimExpiredLease() {
return // Successfully claimed expired lease
}
// Another node is leader
return
}
// Successfully became leader!
le.becomeLeader(newLease.Term)
}
// tryClaimExpiredLease attempts to claim an expired leadership lease
func (le *LeaderElection) tryClaimExpiredLease() bool {
entry, err := le.kv.Get("leader")
if err != nil {
return false
}
var currentLease LeadershipLease
if err := json.Unmarshal(entry.Value(), &currentLease); err != nil {
return false
}
// Check if lease is expired
if time.Now().Before(currentLease.ExpiresAt) {
// Lease is still valid
le.updateCurrentLeader(currentLease.LeaderID, currentLease.Term)
return false
}
// Lease is expired, try to claim it
le.logger.Printf("🕐 Attempting to claim expired lease from %s", currentLease.LeaderID)
now := time.Now()
newLease := LeadershipLease{
LeaderID: le.nodeID,
Term: currentLease.Term + 1,
ExpiresAt: now.Add(LeaderLeaseTimeout),
StartedAt: now,
}
leaseData, err := json.Marshal(newLease)
if err != nil {
return false
}
// Atomically update the lease
_, err = le.kv.Update("leader", leaseData, entry.Revision())
if err != nil {
return false
}
// Successfully claimed expired lease!
le.becomeLeader(newLease.Term)
return true
}
// renewLease renews the current leadership lease
func (le *LeaderElection) renewLease() error {
entry, err := le.kv.Get("leader")
if err != nil {
return err
}
var currentLease LeadershipLease
if err := json.Unmarshal(entry.Value(), &currentLease); err != nil {
return err
}
// Verify we're still the leader
if currentLease.LeaderID != le.nodeID {
return fmt.Errorf("no longer leader, current leader is %s", currentLease.LeaderID)
}
// Renew the lease
renewedLease := currentLease
renewedLease.ExpiresAt = time.Now().Add(LeaderLeaseTimeout)
leaseData, err := json.Marshal(renewedLease)
if err != nil {
return err
}
_, err = le.kv.Update("leader", leaseData, entry.Revision())
if err != nil {
return fmt.Errorf("failed to renew lease: %w", err)
}
le.logger.Printf("💓 Renewed leadership lease until %s", renewedLease.ExpiresAt.Format(time.RFC3339))
return nil
}
// becomeLeader handles becoming the cluster leader
func (le *LeaderElection) becomeLeader(term uint64) {
le.mutex.Lock()
le.isLeader = true
le.currentLeader = le.nodeID
le.leaderTerm = term
le.mutex.Unlock()
le.logger.Printf("👑 Became cluster leader (term %d)", term)
if le.callbacks.OnBecameLeader != nil {
le.callbacks.OnBecameLeader()
}
}
// loseLeadership handles losing leadership
func (le *LeaderElection) loseLeadership() {
le.mutex.Lock()
wasLeader := le.isLeader
le.isLeader = false
le.mutex.Unlock()
if wasLeader {
le.logger.Printf("📉 Lost cluster leadership")
if le.callbacks.OnLostLeader != nil {
le.callbacks.OnLostLeader()
}
}
}
// resignLeadership gracefully resigns from leadership
func (le *LeaderElection) resignLeadership() {
if !le.IsLeader() {
return
}
le.logger.Printf("👋 Resigning from cluster leadership")
// Delete the leadership key
err := le.kv.Delete("leader")
if err != nil {
le.logger.Printf("⚠️ Failed to delete leadership key: %v", err)
}
le.loseLeadership()
}
// shouldTryElection determines if this node should attempt to become leader
func (le *LeaderElection) shouldTryElection() bool {
// Always try if no current leader
if le.GetLeader() == "" {
return true
}
// Check if current lease is expired
entry, err := le.kv.Get("leader")
if err != nil {
// Can't read lease, try to become leader
return true
}
var currentLease LeadershipLease
if err := json.Unmarshal(entry.Value(), &currentLease); err != nil {
// Invalid lease, try to become leader
return true
}
// Try if lease is expired
return time.Now().After(currentLease.ExpiresAt)
}
// handleLeadershipUpdate processes leadership change notifications
func (le *LeaderElection) handleLeadershipUpdate(entry nats.KeyValueEntry) {
if entry.Operation() == nats.KeyValueDelete {
// Leadership was vacated
le.updateCurrentLeader("", 0)
return
}
var lease LeadershipLease
if err := json.Unmarshal(entry.Value(), &lease); err != nil {
le.logger.Printf("⚠️ Invalid leadership lease: %v", err)
return
}
le.updateCurrentLeader(lease.LeaderID, lease.Term)
}
// updateCurrentLeader updates the current leader information
func (le *LeaderElection) updateCurrentLeader(leaderID string, term uint64) {
le.mutex.Lock()
oldLeader := le.currentLeader
le.currentLeader = leaderID
le.leaderTerm = term
// Update our leadership status
if leaderID == le.nodeID {
le.isLeader = true
} else {
if le.isLeader {
le.isLeader = false
le.mutex.Unlock()
if le.callbacks.OnLostLeader != nil {
le.callbacks.OnLostLeader()
}
le.mutex.Lock()
} else {
le.isLeader = false
}
}
le.mutex.Unlock()
// Notify of leader change
if oldLeader != leaderID && leaderID != "" && leaderID != le.nodeID {
le.logger.Printf("🔄 New cluster leader: %s (term %d)", leaderID, term)
if le.callbacks.OnNewLeader != nil {
le.callbacks.OnNewLeader(leaderID)
}
}
}

331
cluster/manager.go Normal file
View File

@@ -0,0 +1,331 @@
package cluster
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
"sync"
"time"
"github.com/nats-io/nats.go"
)
// VMRegistry provides access to local VM information for cluster operations
type VMRegistry interface {
GetActiveVMs() map[string]interface{} // VirtualMachine interface to avoid import cycles
GetShard(actorID string) int
}
// ClusterManager coordinates distributed VM operations across the cluster
type ClusterManager struct {
nodeID string
nodes map[string]*NodeInfo
nodeUpdates chan NodeUpdate
shardMap *ShardMap
hashRing *ConsistentHashRing
election *LeaderElection
natsConn *nats.Conn
ctx context.Context
mutex sync.RWMutex
logger *log.Logger
vmRegistry VMRegistry // Interface to access local VMs
}
// NewClusterManager creates a cluster coordination manager
func NewClusterManager(nodeID string, natsConn *nats.Conn, ctx context.Context) (*ClusterManager, error) {
cm := &ClusterManager{
nodeID: nodeID,
nodes: make(map[string]*NodeInfo),
nodeUpdates: make(chan NodeUpdate, 100),
shardMap: &ShardMap{Shards: make(map[int][]string), Nodes: make(map[string]NodeInfo)},
hashRing: NewConsistentHashRing(),
natsConn: natsConn,
ctx: ctx,
logger: log.New(os.Stdout, fmt.Sprintf("[ClusterMgr %s] ", nodeID), log.LstdFlags),
vmRegistry: nil, // Will be set later via SetVMRegistry
}
// Create leadership election with callbacks
callbacks := LeaderElectionCallbacks{
OnBecameLeader: func() {
cm.logger.Printf("👑 This node became the cluster leader - can initiate rebalancing")
},
OnLostLeader: func() {
cm.logger.Printf("📉 This node lost cluster leadership")
},
OnNewLeader: func(leaderID string) {
cm.logger.Printf("🔄 Cluster leadership changed to: %s", leaderID)
},
}
election, err := NewLeaderElection(nodeID, natsConn, callbacks)
if err != nil {
return nil, fmt.Errorf("failed to create leader election: %w", err)
}
cm.election = election
return cm, nil
}
// Start begins cluster management operations
func (cm *ClusterManager) Start() {
cm.logger.Printf("🚀 Starting cluster manager")
// Start leader election
cm.election.Start()
// Subscribe to cluster messages
cm.natsConn.Subscribe("aether.cluster.*", cm.handleClusterMessage)
// Start node monitoring
go cm.monitorNodes()
// Start shard rebalancing (only if leader)
go cm.rebalanceLoop()
}
// Stop gracefully stops the cluster manager
func (cm *ClusterManager) Stop() {
cm.logger.Printf("🛑 Stopping cluster manager")
if cm.election != nil {
cm.election.Stop()
}
}
// IsLeader returns whether this node is the cluster leader
func (cm *ClusterManager) IsLeader() bool {
if cm.election == nil {
return false
}
return cm.election.IsLeader()
}
// GetLeader returns the current cluster leader ID
func (cm *ClusterManager) GetLeader() string {
if cm.election == nil {
return ""
}
return cm.election.GetLeader()
}
// SetVMRegistry sets the VM registry for accessing local VM information
func (cm *ClusterManager) SetVMRegistry(registry VMRegistry) {
cm.vmRegistry = registry
}
// GetActorsInShard returns actors that belong to a specific shard on this node
func (cm *ClusterManager) GetActorsInShard(shardID int) []string {
if cm.vmRegistry == nil {
return []string{}
}
activeVMs := cm.vmRegistry.GetActiveVMs()
var actors []string
for actorID := range activeVMs {
if cm.vmRegistry.GetShard(actorID) == shardID {
actors = append(actors, actorID)
}
}
return actors
}
// handleClusterMessage processes incoming cluster coordination messages
func (cm *ClusterManager) handleClusterMessage(msg *nats.Msg) {
var clusterMsg ClusterMessage
if err := json.Unmarshal(msg.Data, &clusterMsg); err != nil {
cm.logger.Printf("⚠️ Invalid cluster message: %v", err)
return
}
switch clusterMsg.Type {
case "rebalance":
cm.handleRebalanceRequest(clusterMsg)
case "migrate":
cm.handleMigrationRequest(clusterMsg)
case "node_update":
if update, ok := clusterMsg.Payload.(NodeUpdate); ok {
cm.handleNodeUpdate(update)
}
default:
cm.logger.Printf("⚠️ Unknown cluster message type: %s", clusterMsg.Type)
}
}
// handleNodeUpdate processes node status updates
func (cm *ClusterManager) handleNodeUpdate(update NodeUpdate) {
cm.mutex.Lock()
defer cm.mutex.Unlock()
switch update.Type {
case NodeJoined:
cm.nodes[update.Node.ID] = update.Node
cm.hashRing.AddNode(update.Node.ID)
cm.logger.Printf(" Node joined: %s", update.Node.ID)
case NodeLeft:
delete(cm.nodes, update.Node.ID)
cm.hashRing.RemoveNode(update.Node.ID)
cm.logger.Printf(" Node left: %s", update.Node.ID)
case NodeUpdated:
if node, exists := cm.nodes[update.Node.ID]; exists {
// Update existing node info
*node = *update.Node
} else {
// New node
cm.nodes[update.Node.ID] = update.Node
cm.hashRing.AddNode(update.Node.ID)
}
}
// Check for failed nodes and mark them
now := time.Now()
for _, node := range cm.nodes {
if now.Sub(node.LastSeen) > 90*time.Second && node.Status != NodeStatusFailed {
node.Status = NodeStatusFailed
cm.logger.Printf("❌ Node marked as failed: %s (last seen: %s)",
node.ID, node.LastSeen.Format(time.RFC3339))
}
}
// Trigger rebalancing if we're the leader and there are significant changes
if cm.IsLeader() {
activeNodeCount := 0
for _, node := range cm.nodes {
if node.Status == NodeStatusActive {
activeNodeCount++
}
}
// Simple trigger: rebalance if we have different number of active nodes
// than shards assigned (this is a simplified logic)
if activeNodeCount > 0 {
cm.triggerShardRebalancing("node topology changed")
}
}
}
// handleRebalanceRequest processes cluster rebalancing requests
func (cm *ClusterManager) handleRebalanceRequest(msg ClusterMessage) {
cm.logger.Printf("🔄 Handling rebalance request from %s", msg.From)
// Implementation would handle the specific rebalancing logic
// This is a simplified version
}
// handleMigrationRequest processes actor migration requests
func (cm *ClusterManager) handleMigrationRequest(msg ClusterMessage) {
cm.logger.Printf("🚚 Handling migration request from %s", msg.From)
// Implementation would handle the specific migration logic
// This is a simplified version
}
// triggerShardRebalancing initiates shard rebalancing across the cluster
func (cm *ClusterManager) triggerShardRebalancing(reason string) {
if !cm.IsLeader() {
return // Only leader can initiate rebalancing
}
cm.logger.Printf("⚖️ Triggering shard rebalancing: %s", reason)
// Get active nodes
var activeNodes []*NodeInfo
cm.mutex.RLock()
for _, node := range cm.nodes {
if node.Status == NodeStatusActive {
activeNodes = append(activeNodes, node)
}
}
cm.mutex.RUnlock()
if len(activeNodes) == 0 {
cm.logger.Printf("⚠️ No active nodes available for rebalancing")
return
}
// This would implement the actual rebalancing logic
cm.logger.Printf("🎯 Would rebalance across %d active nodes", len(activeNodes))
}
// monitorNodes periodically checks node health and updates
func (cm *ClusterManager) monitorNodes() {
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
for {
select {
case <-ticker.C:
// Health check logic would go here
cm.checkNodeHealth()
case <-cm.ctx.Done():
return
}
}
}
// checkNodeHealth verifies the health of known nodes
func (cm *ClusterManager) checkNodeHealth() {
cm.mutex.Lock()
defer cm.mutex.Unlock()
now := time.Now()
for _, node := range cm.nodes {
if now.Sub(node.LastSeen) > 90*time.Second && node.Status == NodeStatusActive {
node.Status = NodeStatusFailed
cm.logger.Printf("💔 Node failed: %s", node.ID)
}
}
}
// rebalanceLoop runs periodic rebalancing checks (leader only)
func (cm *ClusterManager) rebalanceLoop() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for {
select {
case <-ticker.C:
if cm.IsLeader() {
cm.triggerShardRebalancing("periodic rebalance check")
}
case <-cm.ctx.Done():
return
}
}
}
// GetNodes returns a copy of the current cluster nodes
func (cm *ClusterManager) GetNodes() map[string]*NodeInfo {
cm.mutex.RLock()
defer cm.mutex.RUnlock()
nodes := make(map[string]*NodeInfo)
for id, node := range cm.nodes {
// Create a copy to prevent external mutation
nodeCopy := *node
nodes[id] = &nodeCopy
}
return nodes
}
// GetShardMap returns the current shard mapping
func (cm *ClusterManager) GetShardMap() *ShardMap {
cm.mutex.RLock()
defer cm.mutex.RUnlock()
// Return a copy to prevent external mutation
return &ShardMap{
Version: cm.shardMap.Version,
Shards: make(map[int][]string),
Nodes: make(map[string]NodeInfo),
UpdateTime: cm.shardMap.UpdateTime,
}
}

188
cluster/shard.go Normal file
View File

@@ -0,0 +1,188 @@
package cluster
import (
"crypto/sha256"
"encoding/binary"
"fmt"
"hash"
"hash/fnv"
)
// MigrationStatus tracks actor migration progress
type MigrationStatus string
const (
MigrationPending MigrationStatus = "pending"
MigrationInProgress MigrationStatus = "in_progress"
MigrationCompleted MigrationStatus = "completed"
MigrationFailed MigrationStatus = "failed"
)
// PlacementStrategy determines where to place new actors
type PlacementStrategy interface {
PlaceActor(actorID string, shardMap *ShardMap, nodes map[string]*NodeInfo) (string, error)
RebalanceShards(shardMap *ShardMap, nodes map[string]*NodeInfo) (*ShardMap, error)
}
// ShardManager handles actor placement and distribution
type ShardManager struct {
shardCount int
shardMap *ShardMap
hasher hash.Hash
placement PlacementStrategy
replication int
}
// NewShardManager creates a new shard manager
func NewShardManager(shardCount, replication int) *ShardManager {
return &ShardManager{
shardCount: shardCount,
shardMap: &ShardMap{Shards: make(map[int][]string), Nodes: make(map[string]NodeInfo)},
hasher: fnv.New64a(),
placement: &ConsistentHashPlacement{},
replication: replication,
}
}
// GetShard returns the shard number for a given actor ID
func (sm *ShardManager) GetShard(actorID string) int {
h := sha256.Sum256([]byte(actorID))
shardID := binary.BigEndian.Uint32(h[:4]) % uint32(sm.shardCount)
return int(shardID)
}
// GetShardNodes returns the nodes responsible for a shard
func (sm *ShardManager) GetShardNodes(shardID int) []string {
if nodes, exists := sm.shardMap.Shards[shardID]; exists {
return nodes
}
return []string{}
}
// AssignShard assigns a shard to specific nodes
func (sm *ShardManager) AssignShard(shardID int, nodes []string) {
if sm.shardMap.Shards == nil {
sm.shardMap.Shards = make(map[int][]string)
}
sm.shardMap.Shards[shardID] = nodes
}
// GetPrimaryNode returns the primary node for a shard
func (sm *ShardManager) GetPrimaryNode(shardID int) string {
nodes := sm.GetShardNodes(shardID)
if len(nodes) > 0 {
return nodes[0] // First node is primary
}
return ""
}
// GetReplicaNodes returns the replica nodes for a shard
func (sm *ShardManager) GetReplicaNodes(shardID int) []string {
nodes := sm.GetShardNodes(shardID)
if len(nodes) > 1 {
return nodes[1:] // All nodes except first are replicas
}
return []string{}
}
// UpdateShardMap updates the entire shard map
func (sm *ShardManager) UpdateShardMap(newShardMap *ShardMap) {
sm.shardMap = newShardMap
}
// GetShardMap returns a copy of the current shard map
func (sm *ShardManager) GetShardMap() *ShardMap {
// Return a deep copy to prevent external mutation
copy := &ShardMap{
Version: sm.shardMap.Version,
Shards: make(map[int][]string),
Nodes: make(map[string]NodeInfo),
UpdateTime: sm.shardMap.UpdateTime,
}
// Copy the shard assignments
for shardID, nodes := range sm.shardMap.Shards {
copy.Shards[shardID] = append([]string(nil), nodes...)
}
// Copy the node info
for nodeID, nodeInfo := range sm.shardMap.Nodes {
copy.Nodes[nodeID] = nodeInfo
}
return copy
}
// RebalanceShards redistributes shards across available nodes
func (sm *ShardManager) RebalanceShards(nodes map[string]*NodeInfo) (*ShardMap, error) {
if sm.placement == nil {
return nil, fmt.Errorf("no placement strategy configured")
}
return sm.placement.RebalanceShards(sm.shardMap, nodes)
}
// PlaceActor determines which node should handle a new actor
func (sm *ShardManager) PlaceActor(actorID string, nodes map[string]*NodeInfo) (string, error) {
if sm.placement == nil {
return "", fmt.Errorf("no placement strategy configured")
}
return sm.placement.PlaceActor(actorID, sm.shardMap, nodes)
}
// GetActorsInShard returns actors that belong to a specific shard on a specific node
func (sm *ShardManager) GetActorsInShard(shardID int, nodeID string, vmRegistry VMRegistry) []string {
if vmRegistry == nil {
return []string{}
}
activeVMs := vmRegistry.GetActiveVMs()
var actors []string
for actorID := range activeVMs {
if sm.GetShard(actorID) == shardID {
actors = append(actors, actorID)
}
}
return actors
}
// ConsistentHashPlacement implements PlacementStrategy using consistent hashing
type ConsistentHashPlacement struct{}
// PlaceActor places an actor using consistent hashing
func (chp *ConsistentHashPlacement) PlaceActor(actorID string, shardMap *ShardMap, nodes map[string]*NodeInfo) (string, error) {
if len(nodes) == 0 {
return "", fmt.Errorf("no nodes available for placement")
}
// Simple consistent hash placement - in a real implementation,
// this would use the consistent hash ring
h := sha256.Sum256([]byte(actorID))
nodeIndex := binary.BigEndian.Uint32(h[:4]) % uint32(len(nodes))
i := 0
for nodeID := range nodes {
if i == int(nodeIndex) {
return nodeID, nil
}
i++
}
// Fallback to first node
for nodeID := range nodes {
return nodeID, nil
}
return "", fmt.Errorf("failed to place actor")
}
// RebalanceShards rebalances shards across nodes
func (chp *ConsistentHashPlacement) RebalanceShards(currentMap *ShardMap, nodes map[string]*NodeInfo) (*ShardMap, error) {
// This is a simplified implementation
// In practice, this would implement sophisticated rebalancing logic
return currentMap, nil
}

110
cluster/types.go Normal file
View File

@@ -0,0 +1,110 @@
package cluster
import (
"time"
)
const (
// NumShards defines the total number of shards in the cluster
NumShards = 1024
// VirtualNodes defines the number of virtual nodes per physical node for consistent hashing
VirtualNodes = 150
// Leadership election constants
LeaderLeaseTimeout = 10 * time.Second // How long a leader lease lasts
HeartbeatInterval = 3 * time.Second // How often leader sends heartbeats
ElectionTimeout = 2 * time.Second // How long to wait for election
)
// NodeStatus represents the health status of a node
type NodeStatus string
const (
NodeStatusActive NodeStatus = "active"
NodeStatusDraining NodeStatus = "draining"
NodeStatusFailed NodeStatus = "failed"
)
// NodeInfo represents information about a cluster node
type NodeInfo struct {
ID string `json:"id"`
Address string `json:"address"`
Port int `json:"port"`
Status NodeStatus `json:"status"`
Capacity float64 `json:"capacity"` // Maximum load capacity
Load float64 `json:"load"` // Current CPU/memory load
LastSeen time.Time `json:"lastSeen"` // Last heartbeat timestamp
Timestamp time.Time `json:"timestamp"`
Metadata map[string]string `json:"metadata"`
IsLeader bool `json:"isLeader"`
VMCount int `json:"vmCount"` // Number of VMs on this node
ShardIDs []int `json:"shardIds"` // Shards assigned to this node
}
// NodeUpdateType represents the type of node update
type NodeUpdateType string
const (
NodeJoined NodeUpdateType = "joined"
NodeLeft NodeUpdateType = "left"
NodeUpdated NodeUpdateType = "updated"
)
// NodeUpdate represents a node status update
type NodeUpdate struct {
Type NodeUpdateType `json:"type"`
Node *NodeInfo `json:"node"`
}
// ShardMap represents the distribution of shards across cluster nodes
type ShardMap struct {
Version uint64 `json:"version"` // Incremented on each change
Shards map[int][]string `json:"shards"` // shard ID -> [primary, replica1, replica2]
Nodes map[string]NodeInfo `json:"nodes"` // node ID -> node info
UpdateTime time.Time `json:"updateTime"`
}
// ClusterMessage represents inter-node communication
type ClusterMessage struct {
Type string `json:"type"`
From string `json:"from"`
To string `json:"to"`
Payload interface{} `json:"payload"`
Timestamp time.Time `json:"timestamp"`
}
// RebalanceRequest represents a request to rebalance shards
type RebalanceRequest struct {
RequestID string `json:"requestId"`
FromNode string `json:"fromNode"`
ToNode string `json:"toNode"`
ShardIDs []int `json:"shardIds"`
Reason string `json:"reason"`
Migrations []ActorMigration `json:"migrations"`
}
// ActorMigration represents the migration of an actor between nodes
type ActorMigration struct {
ActorID string `json:"actorId"`
FromNode string `json:"fromNode"`
ToNode string `json:"toNode"`
ShardID int `json:"shardId"`
State map[string]interface{} `json:"state"`
Version int64 `json:"version"`
Status string `json:"status"` // "pending", "in_progress", "completed", "failed"
}
// LeaderElectionCallbacks defines callbacks for leadership changes
type LeaderElectionCallbacks struct {
OnBecameLeader func()
OnLostLeader func()
OnNewLeader func(leaderID string)
}
// LeadershipLease represents a leadership lease in the cluster
type LeadershipLease struct {
LeaderID string `json:"leaderId"`
Term uint64 `json:"term"`
ExpiresAt time.Time `json:"expiresAt"`
StartedAt time.Time `json:"startedAt"`
}

38
event.go Normal file
View File

@@ -0,0 +1,38 @@
package aether
import (
"time"
)
// Event represents a domain event in the system
type Event struct {
ID string `json:"id"`
EventType string `json:"eventType"`
ActorID string `json:"actorId"`
CommandID string `json:"commandId,omitempty"` // Correlation ID for command that triggered this event
Version int64 `json:"version"`
Data map[string]interface{} `json:"data"`
Timestamp time.Time `json:"timestamp"`
}
// ActorSnapshot represents a point-in-time state snapshot
type ActorSnapshot struct {
ActorID string `json:"actorId"`
Version int64 `json:"version"`
State map[string]interface{} `json:"state"`
Timestamp time.Time `json:"timestamp"`
}
// EventStore defines the interface for event persistence
type EventStore interface {
SaveEvent(event *Event) error
GetEvents(actorID string, fromVersion int64) ([]*Event, error)
GetLatestVersion(actorID string) (int64, error)
}
// SnapshotStore extends EventStore with snapshot capabilities
type SnapshotStore interface {
EventStore
GetLatestSnapshot(actorID string) (*ActorSnapshot, error)
SaveSnapshot(snapshot *ActorSnapshot) error
}

106
eventbus.go Normal file
View File

@@ -0,0 +1,106 @@
package aether
import (
"context"
"sync"
)
// EventBroadcaster defines the interface for publishing and subscribing to events
type EventBroadcaster interface {
Subscribe(namespaceID string) <-chan *Event
Unsubscribe(namespaceID string, ch <-chan *Event)
Publish(namespaceID string, event *Event)
Stop()
SubscriberCount(namespaceID string) int
}
// EventBus broadcasts events to multiple subscribers within a namespace
type EventBus struct {
subscribers map[string][]chan *Event // namespaceID -> channels
mutex sync.RWMutex
ctx context.Context
cancel context.CancelFunc
}
// NewEventBus creates a new event bus
func NewEventBus() *EventBus {
ctx, cancel := context.WithCancel(context.Background())
return &EventBus{
subscribers: make(map[string][]chan *Event),
ctx: ctx,
cancel: cancel,
}
}
// Subscribe creates a new subscription channel for a namespace
func (eb *EventBus) Subscribe(namespaceID string) <-chan *Event {
eb.mutex.Lock()
defer eb.mutex.Unlock()
// Create buffered channel to prevent blocking publishers
ch := make(chan *Event, 100)
eb.subscribers[namespaceID] = append(eb.subscribers[namespaceID], ch)
return ch
}
// Unsubscribe removes a subscription channel
func (eb *EventBus) Unsubscribe(namespaceID string, ch <-chan *Event) {
eb.mutex.Lock()
defer eb.mutex.Unlock()
subs := eb.subscribers[namespaceID]
for i, subscriber := range subs {
if subscriber == ch {
// Remove channel from slice
eb.subscribers[namespaceID] = append(subs[:i], subs[i+1:]...)
close(subscriber)
break
}
}
// Clean up empty namespace entries
if len(eb.subscribers[namespaceID]) == 0 {
delete(eb.subscribers, namespaceID)
}
}
// Publish sends an event to all subscribers of a namespace
func (eb *EventBus) Publish(namespaceID string, event *Event) {
eb.mutex.RLock()
defer eb.mutex.RUnlock()
subscribers := eb.subscribers[namespaceID]
for _, ch := range subscribers {
select {
case ch <- event:
// Event delivered
default:
// Channel full, skip this subscriber (non-blocking)
}
}
}
// Stop closes the event bus
func (eb *EventBus) Stop() {
eb.mutex.Lock()
defer eb.mutex.Unlock()
eb.cancel()
// Close all subscriber channels
for _, subs := range eb.subscribers {
for _, ch := range subs {
close(ch)
}
}
eb.subscribers = make(map[string][]chan *Event)
}
// SubscriberCount returns the number of subscribers for a namespace
func (eb *EventBus) SubscriberCount(namespaceID string) int {
eb.mutex.RLock()
defer eb.mutex.RUnlock()
return len(eb.subscribers[namespaceID])
}

16
go.mod Normal file
View File

@@ -0,0 +1,16 @@
module git.flowmade.one/flowmade-one/aether
go 1.23
require (
github.com/google/uuid v1.6.0
github.com/nats-io/nats.go v1.37.0
)
require (
github.com/klauspost/compress v1.17.2 // indirect
github.com/nats-io/nkeys v0.4.7 // indirect
github.com/nats-io/nuid v1.0.1 // indirect
golang.org/x/crypto v0.18.0 // indirect
golang.org/x/sys v0.16.0 // indirect
)

14
go.sum Normal file
View File

@@ -0,0 +1,14 @@
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/klauspost/compress v1.17.2 h1:RlWWUY/Dr4fL8qk9YG7DTZ7PDgME2V4csBXA8L/ixi4=
github.com/klauspost/compress v1.17.2/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
github.com/nats-io/nats.go v1.37.0 h1:07rauXbVnnJvv1gfIyghFEo6lUcYRY0WXc3x7x0vUxE=
github.com/nats-io/nats.go v1.37.0/go.mod h1:Ubdu4Nh9exXdSz0RVWRFBbRfrbSxOYd26oF0wkWclB8=
github.com/nats-io/nkeys v0.4.7 h1:RwNJbbIdYCoClSDNY7QVKZlyb/wfT6ugvFCiKy6vDvI=
github.com/nats-io/nkeys v0.4.7/go.mod h1:kqXRgRDPlGy7nGaEDMuYzmiJCIAAWDK0IMBtDmGD0nc=
github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
golang.org/x/crypto v0.18.0 h1:PGVlW0xEltQnzFZ55hkuX5+KLyrMYhHld1YHO4AKcdc=
golang.org/x/crypto v0.18.0/go.mod h1:R0j02AL6hcrfOiy9T4ZYp/rcWeMxM3L6QYxlOuEG1mg=
golang.org/x/sys v0.16.0 h1:xWw16ngr6ZMtmxDyKyIgsE93KNKz5HKmMa3b8ALHidU=
golang.org/x/sys v0.16.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=

47
model/model.go Normal file
View File

@@ -0,0 +1,47 @@
package model
// EventStorming model types
// Model represents an event storming model
type Model struct {
ID string `json:"id"`
Name string `json:"name"`
Events []DomainEvent `json:"events"`
Commands []Command `json:"commands"`
Aggregates []Aggregate `json:"aggregates"`
Processes []BusinessProcess `json:"processes"`
}
// DomainEvent represents a domain event definition
type DomainEvent struct {
ID string `json:"id"`
Name string `json:"name"`
Description string `json:"description"`
Data map[string]string `json:"data"`
}
// Command represents a command definition
type Command struct {
ID string `json:"id"`
Name string `json:"name"`
Actor string `json:"actor"`
TriggersEvent string `json:"triggersEvent"`
Data map[string]string `json:"data"`
}
// Aggregate represents an aggregate definition
type Aggregate struct {
ID string `json:"id"`
Name string `json:"name"`
Events []string `json:"events"`
Commands []string `json:"commands"`
Invariants []string `json:"invariants"`
}
// BusinessProcess represents a business process definition
type BusinessProcess struct {
ID string `json:"id"`
Name string `json:"name"`
TriggerEvents []string `json:"triggerEvents"`
OutputCommands []string `json:"outputCommands"`
}

159
nats_eventbus.go Normal file
View File

@@ -0,0 +1,159 @@
package aether
import (
"context"
"encoding/json"
"fmt"
"log"
"sync"
"github.com/google/uuid"
"github.com/nats-io/nats.go"
)
// NATSEventBus is an EventBus that broadcasts events across all cluster nodes using NATS
type NATSEventBus struct {
*EventBus // Embed base EventBus for local subscriptions
nc *nats.Conn // NATS connection
subscriptions []*nats.Subscription
namespaceSubscribers map[string]int // Track number of subscribers per namespace
nodeID string // Unique ID for this node
mutex sync.Mutex
ctx context.Context
cancel context.CancelFunc
}
// eventMessage is the wire format for events sent over NATS
type eventMessage struct {
NodeID string `json:"node_id"`
NamespaceID string `json:"namespace_id"`
Event *Event `json:"event"`
}
// NewNATSEventBus creates a new NATS-backed event bus
func NewNATSEventBus(nc *nats.Conn) (*NATSEventBus, error) {
ctx, cancel := context.WithCancel(context.Background())
neb := &NATSEventBus{
EventBus: NewEventBus(),
nc: nc,
nodeID: uuid.New().String(),
subscriptions: make([]*nats.Subscription, 0),
namespaceSubscribers: make(map[string]int),
ctx: ctx,
cancel: cancel,
}
return neb, nil
}
// Subscribe creates a local subscription and ensures NATS subscription exists for the namespace
func (neb *NATSEventBus) Subscribe(namespaceID string) <-chan *Event {
neb.mutex.Lock()
defer neb.mutex.Unlock()
// Create local subscription first
ch := neb.EventBus.Subscribe(namespaceID)
// Check if this is the first subscriber for this namespace
count := neb.namespaceSubscribers[namespaceID]
if count == 0 {
// First subscriber - create NATS subscription
subject := fmt.Sprintf("aether.events.%s", namespaceID)
sub, err := neb.nc.Subscribe(subject, func(msg *nats.Msg) {
neb.handleNATSEvent(msg)
})
if err != nil {
log.Printf("[NATSEventBus] Failed to subscribe to NATS subject %s: %v", subject, err)
} else {
neb.subscriptions = append(neb.subscriptions, sub)
log.Printf("[NATSEventBus] Node %s subscribed to %s", neb.nodeID, subject)
}
}
neb.namespaceSubscribers[namespaceID] = count + 1
return ch
}
// Unsubscribe removes a local subscription and cleans up NATS subscription if no more subscribers
func (neb *NATSEventBus) Unsubscribe(namespaceID string, ch <-chan *Event) {
neb.mutex.Lock()
defer neb.mutex.Unlock()
neb.EventBus.Unsubscribe(namespaceID, ch)
count := neb.namespaceSubscribers[namespaceID]
if count > 0 {
count--
neb.namespaceSubscribers[namespaceID] = count
if count == 0 {
delete(neb.namespaceSubscribers, namespaceID)
log.Printf("[NATSEventBus] No more subscribers for namespace %s on node %s", namespaceID, neb.nodeID)
}
}
}
// handleNATSEvent processes events received from NATS
func (neb *NATSEventBus) handleNATSEvent(msg *nats.Msg) {
var eventMsg eventMessage
if err := json.Unmarshal(msg.Data, &eventMsg); err != nil {
log.Printf("[NATSEventBus] Failed to unmarshal event: %v", err)
return
}
// Skip events that originated from this node (already delivered locally)
if eventMsg.NodeID == neb.nodeID {
return
}
// Forward to local EventBus subscribers
neb.EventBus.Publish(eventMsg.NamespaceID, eventMsg.Event)
}
// Publish publishes an event both locally and to NATS for cross-node broadcasting
func (neb *NATSEventBus) Publish(namespaceID string, event *Event) {
// First publish locally
neb.EventBus.Publish(namespaceID, event)
// Then publish to NATS for other nodes
subject := fmt.Sprintf("aether.events.%s", namespaceID)
eventMsg := eventMessage{
NodeID: neb.nodeID,
NamespaceID: namespaceID,
Event: event,
}
data, err := json.Marshal(eventMsg)
if err != nil {
log.Printf("[NATSEventBus] Failed to marshal event for NATS: %v", err)
return
}
if err := neb.nc.Publish(subject, data); err != nil {
log.Printf("[NATSEventBus] Failed to publish event to NATS: %v", err)
return
}
}
// Stop closes the NATS event bus and all subscriptions
func (neb *NATSEventBus) Stop() {
neb.mutex.Lock()
defer neb.mutex.Unlock()
neb.cancel()
for _, sub := range neb.subscriptions {
if err := sub.Unsubscribe(); err != nil {
log.Printf("[NATSEventBus] Error unsubscribing: %v", err)
}
}
neb.subscriptions = nil
neb.EventBus.Stop()
log.Printf("[NATSEventBus] Node %s stopped", neb.nodeID)
}

218
store/jetstream.go Normal file
View File

@@ -0,0 +1,218 @@
package store
import (
"encoding/json"
"fmt"
"strings"
"time"
"git.flowmade.one/flowmade-one/aether"
"github.com/nats-io/nats.go"
)
// JetStreamEventStore implements EventStore using NATS JetStream for persistence
type JetStreamEventStore struct {
js nats.JetStreamContext
streamName string
}
// NewJetStreamEventStore creates a new JetStream-based event store
func NewJetStreamEventStore(natsConn *nats.Conn, streamName string) (*JetStreamEventStore, error) {
js, err := natsConn.JetStream()
if err != nil {
return nil, fmt.Errorf("failed to get JetStream context: %w", err)
}
// Create or update the stream
stream := &nats.StreamConfig{
Name: streamName,
Subjects: []string{fmt.Sprintf("%s.events.>", streamName), fmt.Sprintf("%s.snapshots.>", streamName)},
Storage: nats.FileStorage,
Retention: nats.LimitsPolicy,
MaxAge: 365 * 24 * time.Hour, // Keep events for 1 year
Replicas: 1, // Can be increased for HA
}
_, err = js.AddStream(stream)
if err != nil && !strings.Contains(err.Error(), "already exists") {
return nil, fmt.Errorf("failed to create stream: %w", err)
}
return &JetStreamEventStore{
js: js,
streamName: streamName,
}, nil
}
// SaveEvent persists an event to JetStream
func (jes *JetStreamEventStore) SaveEvent(event *aether.Event) error {
// Serialize event to JSON
data, err := json.Marshal(event)
if err != nil {
return fmt.Errorf("failed to marshal event: %w", err)
}
// Create subject: stream.events.actorType.actorID
subject := fmt.Sprintf("%s.events.%s.%s",
jes.streamName,
sanitizeSubject(extractActorType(event.ActorID)),
sanitizeSubject(event.ActorID))
// Publish with event ID as message ID for deduplication
_, err = jes.js.Publish(subject, data, nats.MsgId(event.ID))
if err != nil {
return fmt.Errorf("failed to publish event to JetStream: %w", err)
}
return nil
}
// GetEvents retrieves all events for an actor since a version
func (jes *JetStreamEventStore) GetEvents(actorID string, fromVersion int64) ([]*aether.Event, error) {
// Create subject filter for this actor
subject := fmt.Sprintf("%s.events.%s.%s",
jes.streamName,
sanitizeSubject(extractActorType(actorID)),
sanitizeSubject(actorID))
// Create consumer to read events
consumer, err := jes.js.PullSubscribe(subject, "")
if err != nil {
return nil, fmt.Errorf("failed to create consumer: %w", err)
}
defer consumer.Unsubscribe()
var events []*aether.Event
// Fetch messages in batches
for {
msgs, err := consumer.Fetch(100, nats.MaxWait(time.Second))
if err != nil {
if err == nats.ErrTimeout {
break // No more messages
}
return nil, fmt.Errorf("failed to fetch messages: %w", err)
}
for _, msg := range msgs {
var event aether.Event
if err := json.Unmarshal(msg.Data, &event); err != nil {
continue // Skip malformed events
}
// Filter by version
if event.Version > fromVersion {
events = append(events, &event)
}
msg.Ack()
}
if len(msgs) < 100 {
break // No more messages
}
}
return events, nil
}
// GetLatestVersion returns the latest version for an actor
func (jes *JetStreamEventStore) GetLatestVersion(actorID string) (int64, error) {
events, err := jes.GetEvents(actorID, 0)
if err != nil {
return 0, err
}
if len(events) == 0 {
return 0, nil
}
latestVersion := int64(0)
for _, event := range events {
if event.Version > latestVersion {
latestVersion = event.Version
}
}
return latestVersion, nil
}
// GetLatestSnapshot gets the most recent snapshot for an actor
func (jes *JetStreamEventStore) GetLatestSnapshot(actorID string) (*aether.ActorSnapshot, error) {
// Create subject for snapshots
subject := fmt.Sprintf("%s.snapshots.%s.%s",
jes.streamName,
sanitizeSubject(extractActorType(actorID)),
sanitizeSubject(actorID))
// Try to get the latest snapshot
consumer, err := jes.js.PullSubscribe(subject, "", nats.DeliverLast())
if err != nil {
return nil, fmt.Errorf("failed to create snapshot consumer: %w", err)
}
defer consumer.Unsubscribe()
msgs, err := consumer.Fetch(1, nats.MaxWait(time.Second))
if err != nil {
if err == nats.ErrTimeout {
return nil, fmt.Errorf("no snapshot found for actor %s", actorID)
}
return nil, fmt.Errorf("failed to fetch snapshot: %w", err)
}
if len(msgs) == 0 {
return nil, fmt.Errorf("no snapshot found for actor %s", actorID)
}
var snapshot aether.ActorSnapshot
if err := json.Unmarshal(msgs[0].Data, &snapshot); err != nil {
return nil, fmt.Errorf("failed to unmarshal snapshot: %w", err)
}
msgs[0].Ack()
return &snapshot, nil
}
// SaveSnapshot saves a snapshot of actor state
func (jes *JetStreamEventStore) SaveSnapshot(snapshot *aether.ActorSnapshot) error {
// Serialize snapshot to JSON
data, err := json.Marshal(snapshot)
if err != nil {
return fmt.Errorf("failed to marshal snapshot: %w", err)
}
// Create subject for snapshots
subject := fmt.Sprintf("%s.snapshots.%s.%s",
jes.streamName,
sanitizeSubject(extractActorType(snapshot.ActorID)),
sanitizeSubject(snapshot.ActorID))
// Publish snapshot
_, err = jes.js.Publish(subject, data)
if err != nil {
return fmt.Errorf("failed to publish snapshot to JetStream: %w", err)
}
return nil
}
// Helper functions
// extractActorType extracts the actor type from an actor ID
func extractActorType(actorID string) string {
for i, c := range actorID {
if c == '-' && i > 0 {
return actorID[:i]
}
}
return "unknown"
}
// sanitizeSubject sanitizes a string for use in NATS subjects
func sanitizeSubject(s string) string {
s = strings.ReplaceAll(s, " ", "_")
s = strings.ReplaceAll(s, ".", "_")
s = strings.ReplaceAll(s, "*", "_")
s = strings.ReplaceAll(s, ">", "_")
return s
}

60
store/memory.go Normal file
View File

@@ -0,0 +1,60 @@
package store
import (
"git.flowmade.one/flowmade-one/aether"
)
// InMemoryEventStore provides a simple in-memory event store for testing
type InMemoryEventStore struct {
events map[string][]*aether.Event // actorID -> events
}
// NewInMemoryEventStore creates a new in-memory event store
func NewInMemoryEventStore() *InMemoryEventStore {
return &InMemoryEventStore{
events: make(map[string][]*aether.Event),
}
}
// SaveEvent saves an event to the in-memory store
func (es *InMemoryEventStore) SaveEvent(event *aether.Event) error {
if _, exists := es.events[event.ActorID]; !exists {
es.events[event.ActorID] = make([]*aether.Event, 0)
}
es.events[event.ActorID] = append(es.events[event.ActorID], event)
return nil
}
// GetEvents retrieves events for an actor from a specific version
func (es *InMemoryEventStore) GetEvents(actorID string, fromVersion int64) ([]*aether.Event, error) {
events, exists := es.events[actorID]
if !exists {
return []*aether.Event{}, nil
}
var filteredEvents []*aether.Event
for _, event := range events {
if event.Version >= fromVersion {
filteredEvents = append(filteredEvents, event)
}
}
return filteredEvents, nil
}
// GetLatestVersion returns the latest version for an actor
func (es *InMemoryEventStore) GetLatestVersion(actorID string) (int64, error) {
events, exists := es.events[actorID]
if !exists || len(events) == 0 {
return 0, nil
}
latestVersion := int64(0)
for _, event := range events {
if event.Version > latestVersion {
latestVersion = event.Version
}
}
return latestVersion, nil
}

37
vision.md Normal file
View File

@@ -0,0 +1,37 @@
# Aether Vision
Distributed actor system with event sourcing for Go, powered by NATS.
## Organization Context
This repo is part of Flowmade. See [organization manifesto](../architecture/manifesto.md) for who we are and what we believe.
## What This Is
Aether is an open-source infrastructure library for building distributed, event-sourced systems in Go. It provides:
- **Event sourcing primitives** - Event, EventStore interface, snapshots
- **Event stores** - In-memory (testing) and JetStream (production)
- **Event bus** - Local and NATS-backed pub/sub with namespace isolation
- **Cluster management** - Node discovery, leader election, shard distribution
- **Namespace isolation** - Logical boundaries for multi-scope deployments
## Who This Serves
- **Go developers** building distributed systems
- **Teams** implementing event sourcing and CQRS patterns
- **Projects** needing actor-based concurrency with event persistence
## Goals
1. **Simple event sourcing** - Clear primitives that compose well
2. **NATS-native** - Built for JetStream, not bolted on
3. **Horizontal scaling** - Consistent hashing, shard migration, leader election
4. **Namespace isolation** - Logical boundaries without infrastructure overhead
## Non-Goals
- Opinionated multi-tenancy (product layer concern)
- Domain-specific abstractions (use the primitives)
- GraphQL/REST API generation (build on top)
- UI components (see iris)