Initial aether repository structure
All checks were successful
CI / build (push) Successful in 1m13s
All checks were successful
CI / build (push) Successful in 1m13s
Distributed actor system with event sourcing for Go: - event.go - Event, ActorSnapshot, EventStore interface - eventbus.go - EventBus, EventBroadcaster for pub/sub - nats_eventbus.go - NATS-backed cross-node event broadcasting - store/ - InMemoryEventStore (testing), JetStreamEventStore (production) - cluster/ - Node discovery, leader election, shard distribution - model/ - EventStorming model types Extracted from arcadia as open-source infrastructure component. Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
19
.gitea/workflows/ci.yaml
Normal file
19
.gitea/workflows/ci.yaml
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
name: CI
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
pull_request:
|
||||||
|
branches: [main]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- uses: actions/setup-go@v5
|
||||||
|
with:
|
||||||
|
go-version: '1.23'
|
||||||
|
- name: Build
|
||||||
|
run: go build ./...
|
||||||
|
- name: Test
|
||||||
|
run: go test ./...
|
||||||
20
.gitignore
vendored
Normal file
20
.gitignore
vendored
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
# IDE
|
||||||
|
.idea/
|
||||||
|
.vscode/
|
||||||
|
*.swp
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# Build artifacts
|
||||||
|
/dist/
|
||||||
|
/build/
|
||||||
|
/bin/
|
||||||
|
|
||||||
|
# Go
|
||||||
|
/vendor/
|
||||||
|
|
||||||
|
# Test artifacts
|
||||||
|
*.test
|
||||||
|
coverage.out
|
||||||
116
CLAUDE.md
Normal file
116
CLAUDE.md
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
# Aether
|
||||||
|
|
||||||
|
Distributed actor system with event sourcing for Go, powered by NATS.
|
||||||
|
|
||||||
|
## Organization Context
|
||||||
|
|
||||||
|
This repo is part of Flowmade. See:
|
||||||
|
- [Organization manifesto](../architecture/manifesto.md) - who we are, what we believe
|
||||||
|
- [Repository map](../architecture/repos.md) - how this fits in the bigger picture
|
||||||
|
- [Vision](./vision.md) - what this specific product does
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone git@git.flowmade.one:flowmade-one/aether.git
|
||||||
|
cd aether
|
||||||
|
go mod download
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires NATS server for integration tests:
|
||||||
|
```bash
|
||||||
|
# Install NATS
|
||||||
|
brew install nats-server
|
||||||
|
|
||||||
|
# Run with JetStream enabled
|
||||||
|
nats-server -js
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
aether/
|
||||||
|
├── event.go # Event, ActorSnapshot, EventStore interface
|
||||||
|
├── eventbus.go # EventBus, EventBroadcaster interface
|
||||||
|
├── nats_eventbus.go # NATSEventBus - cross-node event broadcasting
|
||||||
|
├── store/
|
||||||
|
│ ├── memory.go # InMemoryEventStore (testing)
|
||||||
|
│ └── jetstream.go # JetStreamEventStore (production)
|
||||||
|
├── cluster/
|
||||||
|
│ ├── manager.go # ClusterManager
|
||||||
|
│ ├── discovery.go # NodeDiscovery
|
||||||
|
│ ├── hashring.go # ConsistentHashRing
|
||||||
|
│ ├── shard.go # ShardManager
|
||||||
|
│ ├── leader.go # LeaderElection
|
||||||
|
│ └── types.go # Cluster types
|
||||||
|
└── model/
|
||||||
|
└── model.go # EventStorming model types
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make build # Build the library
|
||||||
|
make test # Run tests
|
||||||
|
make lint # Run linters
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Event Sourcing
|
||||||
|
|
||||||
|
Events are the source of truth. State is derived by replaying events.
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Create an event
|
||||||
|
event := &aether.Event{
|
||||||
|
ID: uuid.New().String(),
|
||||||
|
EventType: "OrderPlaced",
|
||||||
|
ActorID: "order-123",
|
||||||
|
Version: 1,
|
||||||
|
Data: map[string]interface{}{"total": 100.00},
|
||||||
|
Timestamp: time.Now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
// Persist to event store
|
||||||
|
store.SaveEvent(event)
|
||||||
|
|
||||||
|
// Replay events to rebuild state
|
||||||
|
events, _ := store.GetEvents("order-123", 0)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Namespace Isolation
|
||||||
|
|
||||||
|
Namespaces provide logical boundaries for events and subscriptions:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Subscribe to events in a namespace
|
||||||
|
ch := eventBus.Subscribe("tenant-abc")
|
||||||
|
|
||||||
|
// Events are isolated per namespace
|
||||||
|
eventBus.Publish("tenant-abc", event) // Only tenant-abc subscribers see this
|
||||||
|
```
|
||||||
|
|
||||||
|
### Clustering
|
||||||
|
|
||||||
|
Aether handles node discovery, leader election, and shard distribution:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Create cluster manager
|
||||||
|
manager := cluster.NewClusterManager(natsConn, nodeID)
|
||||||
|
|
||||||
|
// Join cluster
|
||||||
|
manager.Start()
|
||||||
|
|
||||||
|
// Leader election happens automatically
|
||||||
|
if manager.IsLeader() {
|
||||||
|
// Coordinate shard assignments
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Patterns
|
||||||
|
|
||||||
|
- **Events are immutable** - Never modify, only append
|
||||||
|
- **Snapshots for performance** - Periodically snapshot state to avoid full replay
|
||||||
|
- **Namespaces for isolation** - Not multi-tenancy, just logical boundaries
|
||||||
|
- **NATS for everything** - Events, pub/sub, clustering all use NATS
|
||||||
190
LICENSE
Normal file
190
LICENSE
Normal file
@@ -0,0 +1,190 @@
|
|||||||
|
Apache License
|
||||||
|
Version 2.0, January 2004
|
||||||
|
http://www.apache.org/licenses/
|
||||||
|
|
||||||
|
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||||
|
|
||||||
|
1. Definitions.
|
||||||
|
|
||||||
|
"License" shall mean the terms and conditions for use, reproduction,
|
||||||
|
and distribution as defined by Sections 1 through 9 of this document.
|
||||||
|
|
||||||
|
"Licensor" shall mean the copyright owner or entity authorized by
|
||||||
|
the copyright owner that is granting the License.
|
||||||
|
|
||||||
|
"Legal Entity" shall mean the union of the acting entity and all
|
||||||
|
other entities that control, are controlled by, or are under common
|
||||||
|
control with that entity. For the purposes of this definition,
|
||||||
|
"control" means (i) the power, direct or indirect, to cause the
|
||||||
|
direction or management of such entity, whether by contract or
|
||||||
|
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||||
|
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||||
|
|
||||||
|
"You" (or "Your") shall mean an individual or Legal Entity
|
||||||
|
exercising permissions granted by this License.
|
||||||
|
|
||||||
|
"Source" form shall mean the preferred form for making modifications,
|
||||||
|
including but not limited to software source code, documentation
|
||||||
|
source, and configuration files.
|
||||||
|
|
||||||
|
"Object" form shall mean any form resulting from mechanical
|
||||||
|
transformation or translation of a Source form, including but
|
||||||
|
not limited to compiled object code, generated documentation,
|
||||||
|
and conversions to other media types.
|
||||||
|
|
||||||
|
"Work" shall mean the work of authorship, whether in Source or
|
||||||
|
Object form, made available under the License, as indicated by a
|
||||||
|
copyright notice that is included in or attached to the work
|
||||||
|
(an example is provided in the Appendix below).
|
||||||
|
|
||||||
|
"Derivative Works" shall mean any work, whether in Source or Object
|
||||||
|
form, that is based on (or derived from) the Work and for which the
|
||||||
|
editorial revisions, annotations, elaborations, or other modifications
|
||||||
|
represent, as a whole, an original work of authorship. For the purposes
|
||||||
|
of this License, Derivative Works shall not include works that remain
|
||||||
|
separable from, or merely link (or bind by name) to the interfaces of,
|
||||||
|
the Work and Derivative Works thereof.
|
||||||
|
|
||||||
|
"Contribution" shall mean any work of authorship, including
|
||||||
|
the original version of the Work and any modifications or additions
|
||||||
|
to that Work or Derivative Works thereof, that is intentionally
|
||||||
|
submitted to the Licensor for inclusion in the Work by the copyright owner
|
||||||
|
or by an individual or Legal Entity authorized to submit on behalf of
|
||||||
|
the copyright owner. For the purposes of this definition, "submitted"
|
||||||
|
means any form of electronic, verbal, or written communication sent
|
||||||
|
to the Licensor or its representatives, including but not limited to
|
||||||
|
communication on electronic mailing lists, source code control systems,
|
||||||
|
and issue tracking systems that are managed by, or on behalf of, the
|
||||||
|
Licensor for the purpose of discussing and improving the Work, but
|
||||||
|
excluding communication that is conspicuously marked or otherwise
|
||||||
|
designated in writing by the copyright owner as "Not a Contribution."
|
||||||
|
|
||||||
|
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||||
|
on behalf of whom a Contribution has been received by Licensor and
|
||||||
|
subsequently incorporated within the Work.
|
||||||
|
|
||||||
|
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||||
|
this License, each Contributor hereby grants to You a perpetual,
|
||||||
|
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||||
|
copyright license to reproduce, prepare Derivative Works of,
|
||||||
|
publicly display, publicly perform, sublicense, and distribute the
|
||||||
|
Work and such Derivative Works in Source or Object form.
|
||||||
|
|
||||||
|
3. Grant of Patent License. Subject to the terms and conditions of
|
||||||
|
this License, each Contributor hereby grants to You a perpetual,
|
||||||
|
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||||
|
(except as stated in this section) patent license to make, have made,
|
||||||
|
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||||
|
where such license applies only to those patent claims licensable
|
||||||
|
by such Contributor that are necessarily infringed by their
|
||||||
|
Contribution(s) alone or by combination of their Contribution(s)
|
||||||
|
with the Work to which such Contribution(s) was submitted. If You
|
||||||
|
institute patent litigation against any entity (including a
|
||||||
|
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||||
|
or a Contribution incorporated within the Work constitutes direct
|
||||||
|
or contributory patent infringement, then any patent licenses
|
||||||
|
granted to You under this License for that Work shall terminate
|
||||||
|
as of the date such litigation is filed.
|
||||||
|
|
||||||
|
4. Redistribution. You may reproduce and distribute copies of the
|
||||||
|
Work or Derivative Works thereof in any medium, with or without
|
||||||
|
modifications, and in Source or Object form, provided that You
|
||||||
|
meet the following conditions:
|
||||||
|
|
||||||
|
(a) You must give any other recipients of the Work or
|
||||||
|
Derivative Works a copy of this License; and
|
||||||
|
|
||||||
|
(b) You must cause any modified files to carry prominent notices
|
||||||
|
stating that You changed the files; and
|
||||||
|
|
||||||
|
(c) You must retain, in the Source form of any Derivative Works
|
||||||
|
that You distribute, all copyright, patent, trademark, and
|
||||||
|
attribution notices from the Source form of the Work,
|
||||||
|
excluding those notices that do not pertain to any part of
|
||||||
|
the Derivative Works; and
|
||||||
|
|
||||||
|
(d) If the Work includes a "NOTICE" text file as part of its
|
||||||
|
distribution, then any Derivative Works that You distribute must
|
||||||
|
include a readable copy of the attribution notices contained
|
||||||
|
within such NOTICE file, excluding those notices that do not
|
||||||
|
pertain to any part of the Derivative Works, in at least one
|
||||||
|
of the following places: within a NOTICE text file distributed
|
||||||
|
as part of the Derivative Works; within the Source form or
|
||||||
|
documentation, if provided along with the Derivative Works; or,
|
||||||
|
within a display generated by the Derivative Works, if and
|
||||||
|
wherever such third-party notices normally appear. The contents
|
||||||
|
of the NOTICE file are for informational purposes only and
|
||||||
|
do not modify the License. You may add Your own attribution
|
||||||
|
notices within Derivative Works that You distribute, alongside
|
||||||
|
or as an addendum to the NOTICE text from the Work, provided
|
||||||
|
that such additional attribution notices cannot be construed
|
||||||
|
as modifying the License.
|
||||||
|
|
||||||
|
You may add Your own copyright statement to Your modifications and
|
||||||
|
may provide additional or different license terms and conditions
|
||||||
|
for use, reproduction, or distribution of Your modifications, or
|
||||||
|
for any such Derivative Works as a whole, provided Your use,
|
||||||
|
reproduction, and distribution of the Work otherwise complies with
|
||||||
|
the conditions stated in this License.
|
||||||
|
|
||||||
|
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||||
|
any Contribution intentionally submitted for inclusion in the Work
|
||||||
|
by You to the Licensor shall be under the terms and conditions of
|
||||||
|
this License, without any additional terms or conditions.
|
||||||
|
Notwithstanding the above, nothing herein shall supersede or modify
|
||||||
|
the terms of any separate license agreement you may have executed
|
||||||
|
with Licensor regarding such Contributions.
|
||||||
|
|
||||||
|
6. Trademarks. This License does not grant permission to use the trade
|
||||||
|
names, trademarks, service marks, or product names of the Licensor,
|
||||||
|
except as required for reasonable and customary use in describing the
|
||||||
|
origin of the Work and reproducing the content of the NOTICE file.
|
||||||
|
|
||||||
|
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||||
|
agreed to in writing, Licensor provides the Work (and each
|
||||||
|
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||||
|
implied, including, without limitation, any warranties or conditions
|
||||||
|
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||||
|
appropriateness of using or redistributing the Work and assume any
|
||||||
|
risks associated with Your exercise of permissions under this License.
|
||||||
|
|
||||||
|
8. Limitation of Liability. In no event and under no legal theory,
|
||||||
|
whether in tort (including negligence), contract, or otherwise,
|
||||||
|
unless required by applicable law (such as deliberate and grossly
|
||||||
|
negligent acts) or agreed to in writing, shall any Contributor be
|
||||||
|
liable to You for damages, including any direct, indirect, special,
|
||||||
|
incidental, or consequential damages of any character arising as a
|
||||||
|
result of this License or out of the use or inability to use the
|
||||||
|
Work (including but not limited to damages for loss of goodwill,
|
||||||
|
work stoppage, computer failure or malfunction, or any and all
|
||||||
|
other commercial damages or losses), even if such Contributor
|
||||||
|
has been advised of the possibility of such damages.
|
||||||
|
|
||||||
|
9. Accepting Warranty or Additional Liability. While redistributing
|
||||||
|
the Work or Derivative Works thereof, You may choose to offer,
|
||||||
|
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||||
|
or other liability obligations and/or rights consistent with this
|
||||||
|
License. However, in accepting such obligations, You may act only
|
||||||
|
on Your own behalf and on Your sole responsibility, not on behalf
|
||||||
|
of any other Contributor, and only if You agree to indemnify,
|
||||||
|
defend, and hold each Contributor harmless for any liability
|
||||||
|
incurred by, or claims asserted against, such Contributor by reason
|
||||||
|
of your accepting any such warranty or additional liability.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
Copyright 2024-2026 Flowmade
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License.
|
||||||
13
Makefile
Normal file
13
Makefile
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
.PHONY: build test lint clean
|
||||||
|
|
||||||
|
build:
|
||||||
|
go build ./...
|
||||||
|
|
||||||
|
test:
|
||||||
|
go test ./...
|
||||||
|
|
||||||
|
lint:
|
||||||
|
golangci-lint run
|
||||||
|
|
||||||
|
clean:
|
||||||
|
go clean
|
||||||
48
cluster/cluster.go
Normal file
48
cluster/cluster.go
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
// Package cluster provides distributed computing capabilities for the Aether VM runtime.
|
||||||
|
//
|
||||||
|
// This package implements a distributed actor system using NATS for coordination,
|
||||||
|
// featuring consistent hashing for shard distribution, leader election for
|
||||||
|
// coordination, and fault-tolerant actor migration between nodes.
|
||||||
|
//
|
||||||
|
// Key Components:
|
||||||
|
//
|
||||||
|
// - ConsistentHashRing: Distributes actors across cluster nodes using consistent hashing
|
||||||
|
// - LeaderElection: NATS-based leader election with lease-based coordination
|
||||||
|
// - ClusterManager: Coordinates distributed operations and shard rebalancing
|
||||||
|
// - NodeDiscovery: Manages cluster membership and node health monitoring
|
||||||
|
// - ShardManager: Handles actor placement and distribution across shards
|
||||||
|
// - DistributedVM: Main entry point for distributed VM cluster operations
|
||||||
|
//
|
||||||
|
// Usage:
|
||||||
|
//
|
||||||
|
// // Create a distributed VM node
|
||||||
|
// distributedVM, err := cluster.NewDistributedVM("node-1", []string{"nats://localhost:4222"}, localRuntime)
|
||||||
|
// if err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// // Start the cluster node
|
||||||
|
// if err := distributedVM.Start(); err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// // Load a model across the cluster
|
||||||
|
// if err := distributedVM.LoadModel(eventStormingModel); err != nil {
|
||||||
|
// log.Fatal(err)
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// Architecture:
|
||||||
|
//
|
||||||
|
// The cluster package implements a distributed actor system where each node
|
||||||
|
// runs a local VM runtime and coordinates with other nodes through NATS.
|
||||||
|
// Actors are sharded across nodes using consistent hashing, and the system
|
||||||
|
// supports dynamic rebalancing when nodes join or leave the cluster.
|
||||||
|
//
|
||||||
|
// Fault Tolerance:
|
||||||
|
//
|
||||||
|
// - Automatic node failure detection through heartbeat monitoring
|
||||||
|
// - Leader election ensures coordination continues despite node failures
|
||||||
|
// - Actor migration allows rebalancing when cluster topology changes
|
||||||
|
// - Graceful shutdown with proper resource cleanup
|
||||||
|
//
|
||||||
|
package cluster
|
||||||
118
cluster/discovery.go
Normal file
118
cluster/discovery.go
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NodeDiscovery manages cluster membership using NATS
|
||||||
|
type NodeDiscovery struct {
|
||||||
|
nodeID string
|
||||||
|
nodeInfo *NodeInfo
|
||||||
|
natsConn *nats.Conn
|
||||||
|
heartbeat time.Duration
|
||||||
|
timeout time.Duration
|
||||||
|
updates chan NodeUpdate
|
||||||
|
ctx context.Context
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewNodeDiscovery creates a node discovery service
|
||||||
|
func NewNodeDiscovery(nodeID string, natsConn *nats.Conn, ctx context.Context) *NodeDiscovery {
|
||||||
|
nodeInfo := &NodeInfo{
|
||||||
|
ID: nodeID,
|
||||||
|
Status: NodeStatusActive,
|
||||||
|
Capacity: 1000, // Default capacity
|
||||||
|
Load: 0,
|
||||||
|
LastSeen: time.Now(),
|
||||||
|
Metadata: make(map[string]string),
|
||||||
|
}
|
||||||
|
|
||||||
|
return &NodeDiscovery{
|
||||||
|
nodeID: nodeID,
|
||||||
|
nodeInfo: nodeInfo,
|
||||||
|
natsConn: natsConn,
|
||||||
|
heartbeat: 30 * time.Second,
|
||||||
|
timeout: 90 * time.Second,
|
||||||
|
updates: make(chan NodeUpdate, 100),
|
||||||
|
ctx: ctx,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start begins node discovery and heartbeating
|
||||||
|
func (nd *NodeDiscovery) Start() {
|
||||||
|
// Announce this node joining
|
||||||
|
nd.announceNode(NodeJoined)
|
||||||
|
|
||||||
|
// Start heartbeat
|
||||||
|
ticker := time.NewTicker(nd.heartbeat)
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
// Subscribe to node announcements
|
||||||
|
nd.natsConn.Subscribe("aether.discovery", func(msg *nats.Msg) {
|
||||||
|
var update NodeUpdate
|
||||||
|
if err := json.Unmarshal(msg.Data, &update); err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
select {
|
||||||
|
case nd.updates <- update:
|
||||||
|
case <-nd.ctx.Done():
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-ticker.C:
|
||||||
|
nd.announceNode(NodeUpdated)
|
||||||
|
|
||||||
|
case <-nd.ctx.Done():
|
||||||
|
nd.announceNode(NodeLeft)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetUpdates returns the channel for receiving node updates
|
||||||
|
func (nd *NodeDiscovery) GetUpdates() <-chan NodeUpdate {
|
||||||
|
return nd.updates
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetNodeInfo returns the current node information
|
||||||
|
func (nd *NodeDiscovery) GetNodeInfo() *NodeInfo {
|
||||||
|
return nd.nodeInfo
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateLoad updates the node's current load
|
||||||
|
func (nd *NodeDiscovery) UpdateLoad(load float64) {
|
||||||
|
nd.nodeInfo.Load = load
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateVMCount updates the number of VMs on this node
|
||||||
|
func (nd *NodeDiscovery) UpdateVMCount(count int) {
|
||||||
|
nd.nodeInfo.VMCount = count
|
||||||
|
}
|
||||||
|
|
||||||
|
// announceNode publishes node status to the cluster
|
||||||
|
func (nd *NodeDiscovery) announceNode(updateType NodeUpdateType) {
|
||||||
|
nd.nodeInfo.LastSeen = time.Now()
|
||||||
|
|
||||||
|
update := NodeUpdate{
|
||||||
|
Type: updateType,
|
||||||
|
Node: nd.nodeInfo,
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := json.Marshal(update)
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
nd.natsConn.Publish("aether.discovery", data)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop gracefully stops the node discovery service
|
||||||
|
func (nd *NodeDiscovery) Stop() {
|
||||||
|
nd.announceNode(NodeLeft)
|
||||||
|
}
|
||||||
221
cluster/distributed.go
Normal file
221
cluster/distributed.go
Normal file
@@ -0,0 +1,221 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
)
|
||||||
|
|
||||||
|
// DistributedVM manages a cluster of runtime nodes with VM-per-instance architecture
|
||||||
|
type DistributedVM struct {
|
||||||
|
nodeID string
|
||||||
|
cluster *ClusterManager
|
||||||
|
localRuntime Runtime // Interface to avoid import cycles
|
||||||
|
sharding *ShardManager
|
||||||
|
discovery *NodeDiscovery
|
||||||
|
natsConn *nats.Conn
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
}
|
||||||
|
|
||||||
|
// Runtime interface to avoid import cycles with main aether package
|
||||||
|
type Runtime interface {
|
||||||
|
Start() error
|
||||||
|
LoadModel(model interface{}) error
|
||||||
|
SendMessage(message interface{}) error
|
||||||
|
}
|
||||||
|
|
||||||
|
// DistributedVMRegistry implements VMRegistry using DistributedVM's local runtime and sharding
|
||||||
|
type DistributedVMRegistry struct {
|
||||||
|
runtime interface{} // Runtime interface to avoid import cycles
|
||||||
|
sharding *ShardManager
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewDistributedVM creates a distributed VM runtime cluster node
|
||||||
|
func NewDistributedVM(nodeID string, natsURLs []string, localRuntime Runtime) (*DistributedVM, error) {
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
|
||||||
|
// Connect to NATS cluster
|
||||||
|
natsURL := natsURLs[0] // Use first URL for simplicity
|
||||||
|
natsConn, err := nats.Connect(natsURL,
|
||||||
|
nats.Name(fmt.Sprintf("aether-runtime-%s", nodeID)))
|
||||||
|
if err != nil {
|
||||||
|
cancel()
|
||||||
|
return nil, fmt.Errorf("failed to connect to NATS: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create cluster components
|
||||||
|
discovery := NewNodeDiscovery(nodeID, natsConn, ctx)
|
||||||
|
sharding := NewShardManager(1024, 3) // 1024 shards, 3 replicas
|
||||||
|
cluster, err := NewClusterManager(nodeID, natsConn, ctx)
|
||||||
|
if err != nil {
|
||||||
|
cancel()
|
||||||
|
natsConn.Close()
|
||||||
|
return nil, fmt.Errorf("failed to create cluster manager: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
dvm := &DistributedVM{
|
||||||
|
nodeID: nodeID,
|
||||||
|
cluster: cluster,
|
||||||
|
localRuntime: localRuntime,
|
||||||
|
sharding: sharding,
|
||||||
|
discovery: discovery,
|
||||||
|
natsConn: natsConn,
|
||||||
|
ctx: ctx,
|
||||||
|
cancel: cancel,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create VM registry and connect it to cluster manager
|
||||||
|
vmRegistry := &DistributedVMRegistry{
|
||||||
|
runtime: localRuntime,
|
||||||
|
sharding: sharding,
|
||||||
|
}
|
||||||
|
cluster.SetVMRegistry(vmRegistry)
|
||||||
|
|
||||||
|
return dvm, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start begins the distributed VM cluster node
|
||||||
|
func (dvm *DistributedVM) Start() error {
|
||||||
|
// Start local runtime
|
||||||
|
if err := dvm.localRuntime.Start(); err != nil {
|
||||||
|
return fmt.Errorf("failed to start local runtime: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start cluster services
|
||||||
|
go dvm.discovery.Start()
|
||||||
|
go dvm.cluster.Start()
|
||||||
|
|
||||||
|
// Start message routing
|
||||||
|
go dvm.startMessageRouting()
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop gracefully shuts down the distributed VM node
|
||||||
|
func (dvm *DistributedVM) Stop() {
|
||||||
|
dvm.cancel()
|
||||||
|
dvm.cluster.Stop()
|
||||||
|
dvm.discovery.Stop()
|
||||||
|
dvm.natsConn.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
// LoadModel distributes EventStorming model across the cluster with VM templates
|
||||||
|
func (dvm *DistributedVM) LoadModel(model interface{}) error {
|
||||||
|
// Load model locally first
|
||||||
|
if err := dvm.localRuntime.LoadModel(model); err != nil {
|
||||||
|
return fmt.Errorf("failed to load model locally: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Broadcast model to other cluster nodes
|
||||||
|
msg := ClusterMessage{
|
||||||
|
Type: "load_model",
|
||||||
|
From: dvm.nodeID,
|
||||||
|
To: "broadcast",
|
||||||
|
Payload: model,
|
||||||
|
}
|
||||||
|
|
||||||
|
return dvm.publishClusterMessage(msg)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SendMessage routes messages across the distributed cluster
|
||||||
|
func (dvm *DistributedVM) SendMessage(message interface{}) error {
|
||||||
|
// This is a simplified implementation
|
||||||
|
// In practice, this would determine the target node based on sharding
|
||||||
|
// and route the message appropriately
|
||||||
|
|
||||||
|
return dvm.localRuntime.SendMessage(message)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActorNode determines which node should handle a specific actor
|
||||||
|
func (dvm *DistributedVM) GetActorNode(actorID string) string {
|
||||||
|
// Use consistent hashing to determine the target node
|
||||||
|
return dvm.cluster.hashRing.GetNode(actorID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsLocalActor checks if an actor should be handled by this node
|
||||||
|
func (dvm *DistributedVM) IsLocalActor(actorID string) bool {
|
||||||
|
targetNode := dvm.GetActorNode(actorID)
|
||||||
|
return targetNode == dvm.nodeID
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActorsInShard returns actors that belong to a specific shard on this node
|
||||||
|
func (dvm *DistributedVM) GetActorsInShard(shardID int) []string {
|
||||||
|
return dvm.cluster.GetActorsInShard(shardID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// startMessageRouting begins routing messages between cluster nodes
|
||||||
|
func (dvm *DistributedVM) startMessageRouting() {
|
||||||
|
// Subscribe to cluster messages
|
||||||
|
dvm.natsConn.Subscribe("aether.distributed.*", dvm.handleClusterMessage)
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleClusterMessage processes incoming cluster coordination messages
|
||||||
|
func (dvm *DistributedVM) handleClusterMessage(msg *nats.Msg) {
|
||||||
|
var clusterMsg ClusterMessage
|
||||||
|
if err := json.Unmarshal(msg.Data, &clusterMsg); err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
switch clusterMsg.Type {
|
||||||
|
case "load_model":
|
||||||
|
// Handle model loading from other nodes
|
||||||
|
if model := clusterMsg.Payload; model != nil {
|
||||||
|
dvm.localRuntime.LoadModel(model)
|
||||||
|
}
|
||||||
|
|
||||||
|
case "route_message":
|
||||||
|
// Handle message routing from other nodes
|
||||||
|
if message := clusterMsg.Payload; message != nil {
|
||||||
|
dvm.localRuntime.SendMessage(message)
|
||||||
|
}
|
||||||
|
|
||||||
|
case "rebalance":
|
||||||
|
// Handle shard rebalancing requests
|
||||||
|
dvm.handleRebalanceRequest(clusterMsg)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleRebalanceRequest processes shard rebalancing requests
|
||||||
|
func (dvm *DistributedVM) handleRebalanceRequest(msg ClusterMessage) {
|
||||||
|
// Simplified rebalancing logic
|
||||||
|
// In practice, this would implement complex actor migration
|
||||||
|
}
|
||||||
|
|
||||||
|
// publishClusterMessage sends a message to other cluster nodes
|
||||||
|
func (dvm *DistributedVM) publishClusterMessage(msg ClusterMessage) error {
|
||||||
|
data, err := json.Marshal(msg)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
subject := fmt.Sprintf("aether.distributed.%s", msg.Type)
|
||||||
|
return dvm.natsConn.Publish(subject, data)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetClusterInfo returns information about the cluster state
|
||||||
|
func (dvm *DistributedVM) GetClusterInfo() map[string]interface{} {
|
||||||
|
nodes := dvm.cluster.GetNodes()
|
||||||
|
|
||||||
|
return map[string]interface{}{
|
||||||
|
"nodeId": dvm.nodeID,
|
||||||
|
"isLeader": dvm.cluster.IsLeader(),
|
||||||
|
"leader": dvm.cluster.GetLeader(),
|
||||||
|
"nodeCount": len(nodes),
|
||||||
|
"nodes": nodes,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActiveVMs returns a map of active VMs (implementation depends on runtime)
|
||||||
|
func (dvr *DistributedVMRegistry) GetActiveVMs() map[string]interface{} {
|
||||||
|
// This would need to access the actual runtime's VM registry
|
||||||
|
// For now, return empty map to avoid import cycles
|
||||||
|
return make(map[string]interface{})
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetShard returns the shard number for the given actor ID
|
||||||
|
func (dvr *DistributedVMRegistry) GetShard(actorID string) int {
|
||||||
|
return dvr.sharding.GetShard(actorID)
|
||||||
|
}
|
||||||
105
cluster/hashring.go
Normal file
105
cluster/hashring.go
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/binary"
|
||||||
|
"fmt"
|
||||||
|
"sort"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ConsistentHashRing implements a consistent hash ring for shard distribution
|
||||||
|
type ConsistentHashRing struct {
|
||||||
|
ring map[uint32]string // hash -> node ID
|
||||||
|
sortedHashes []uint32 // sorted hash keys
|
||||||
|
nodes map[string]bool // active nodes
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewConsistentHashRing creates a new consistent hash ring
|
||||||
|
func NewConsistentHashRing() *ConsistentHashRing {
|
||||||
|
return &ConsistentHashRing{
|
||||||
|
ring: make(map[uint32]string),
|
||||||
|
nodes: make(map[string]bool),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// AddNode adds a node to the hash ring
|
||||||
|
func (chr *ConsistentHashRing) AddNode(nodeID string) {
|
||||||
|
if chr.nodes[nodeID] {
|
||||||
|
return // Node already exists
|
||||||
|
}
|
||||||
|
|
||||||
|
chr.nodes[nodeID] = true
|
||||||
|
|
||||||
|
// Add virtual nodes for better distribution
|
||||||
|
for i := 0; i < VirtualNodes; i++ {
|
||||||
|
virtualKey := fmt.Sprintf("%s:%d", nodeID, i)
|
||||||
|
hash := chr.hash(virtualKey)
|
||||||
|
chr.ring[hash] = nodeID
|
||||||
|
chr.sortedHashes = append(chr.sortedHashes, hash)
|
||||||
|
}
|
||||||
|
|
||||||
|
sort.Slice(chr.sortedHashes, func(i, j int) bool {
|
||||||
|
return chr.sortedHashes[i] < chr.sortedHashes[j]
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// RemoveNode removes a node from the hash ring
|
||||||
|
func (chr *ConsistentHashRing) RemoveNode(nodeID string) {
|
||||||
|
if !chr.nodes[nodeID] {
|
||||||
|
return // Node doesn't exist
|
||||||
|
}
|
||||||
|
|
||||||
|
delete(chr.nodes, nodeID)
|
||||||
|
|
||||||
|
// Remove all virtual nodes for this physical node
|
||||||
|
newHashes := make([]uint32, 0)
|
||||||
|
for _, hash := range chr.sortedHashes {
|
||||||
|
if chr.ring[hash] != nodeID {
|
||||||
|
newHashes = append(newHashes, hash)
|
||||||
|
} else {
|
||||||
|
delete(chr.ring, hash)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
chr.sortedHashes = newHashes
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetNode returns the node responsible for a given key
|
||||||
|
func (chr *ConsistentHashRing) GetNode(key string) string {
|
||||||
|
if len(chr.sortedHashes) == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
hash := chr.hash(key)
|
||||||
|
|
||||||
|
// Find the first node with hash >= key hash (clockwise)
|
||||||
|
idx := sort.Search(len(chr.sortedHashes), func(i int) bool {
|
||||||
|
return chr.sortedHashes[i] >= hash
|
||||||
|
})
|
||||||
|
|
||||||
|
// Wrap around to the first node if we've gone past the end
|
||||||
|
if idx == len(chr.sortedHashes) {
|
||||||
|
idx = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
return chr.ring[chr.sortedHashes[idx]]
|
||||||
|
}
|
||||||
|
|
||||||
|
// hash computes a hash for the given key
|
||||||
|
func (chr *ConsistentHashRing) hash(key string) uint32 {
|
||||||
|
h := sha256.Sum256([]byte(key))
|
||||||
|
return binary.BigEndian.Uint32(h[:4])
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetNodes returns all active nodes in the ring
|
||||||
|
func (chr *ConsistentHashRing) GetNodes() []string {
|
||||||
|
nodes := make([]string, 0, len(chr.nodes))
|
||||||
|
for nodeID := range chr.nodes {
|
||||||
|
nodes = append(nodes, nodeID)
|
||||||
|
}
|
||||||
|
return nodes
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsEmpty returns true if the ring has no nodes
|
||||||
|
func (chr *ConsistentHashRing) IsEmpty() bool {
|
||||||
|
return len(chr.nodes) == 0
|
||||||
|
}
|
||||||
414
cluster/leader.go
Normal file
414
cluster/leader.go
Normal file
@@ -0,0 +1,414 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"os"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
)
|
||||||
|
|
||||||
|
// LeaderElection manages NATS-based leader election using lease-based coordination
|
||||||
|
type LeaderElection struct {
|
||||||
|
nodeID string
|
||||||
|
natsConn *nats.Conn
|
||||||
|
js nats.JetStreamContext
|
||||||
|
kv nats.KeyValue
|
||||||
|
isLeader bool
|
||||||
|
currentLeader string
|
||||||
|
leaderTerm uint64
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
mutex sync.RWMutex
|
||||||
|
logger *log.Logger
|
||||||
|
callbacks LeaderElectionCallbacks
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewLeaderElection creates a new NATS-based leader election system
|
||||||
|
func NewLeaderElection(nodeID string, natsConn *nats.Conn, callbacks LeaderElectionCallbacks) (*LeaderElection, error) {
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
|
||||||
|
// Create JetStream context
|
||||||
|
js, err := natsConn.JetStream()
|
||||||
|
if err != nil {
|
||||||
|
cancel()
|
||||||
|
return nil, fmt.Errorf("failed to create JetStream context: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create or get KV store for leader election
|
||||||
|
kv, err := js.CreateKeyValue(&nats.KeyValueConfig{
|
||||||
|
Bucket: "aether-leader-election",
|
||||||
|
Description: "Aether cluster leader election coordination",
|
||||||
|
TTL: LeaderLeaseTimeout * 2, // Auto-cleanup expired leases
|
||||||
|
MaxBytes: 1024 * 1024, // 1MB max
|
||||||
|
Replicas: 1, // Single replica for simplicity
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
// Try to get existing KV store
|
||||||
|
kv, err = js.KeyValue("aether-leader-election")
|
||||||
|
if err != nil {
|
||||||
|
cancel()
|
||||||
|
return nil, fmt.Errorf("failed to create/get KV store: %w", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return &LeaderElection{
|
||||||
|
nodeID: nodeID,
|
||||||
|
natsConn: natsConn,
|
||||||
|
js: js,
|
||||||
|
kv: kv,
|
||||||
|
ctx: ctx,
|
||||||
|
cancel: cancel,
|
||||||
|
logger: log.New(os.Stdout, fmt.Sprintf("[Leader %s] ", nodeID), log.LstdFlags),
|
||||||
|
callbacks: callbacks,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start begins the leader election process
|
||||||
|
func (le *LeaderElection) Start() {
|
||||||
|
le.logger.Printf("🗳️ Starting leader election")
|
||||||
|
|
||||||
|
// Start election loop in background
|
||||||
|
go le.electionLoop()
|
||||||
|
|
||||||
|
// Start lease renewal loop in background
|
||||||
|
go le.leaseRenewalLoop()
|
||||||
|
|
||||||
|
// Start leader monitoring
|
||||||
|
go le.monitorLeadership()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop stops the leader election process
|
||||||
|
func (le *LeaderElection) Stop() {
|
||||||
|
le.logger.Printf("🛑 Stopping leader election")
|
||||||
|
|
||||||
|
le.cancel()
|
||||||
|
|
||||||
|
// If we're the leader, resign gracefully
|
||||||
|
if le.IsLeader() {
|
||||||
|
le.resignLeadership()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsLeader returns whether this node is currently the leader
|
||||||
|
func (le *LeaderElection) IsLeader() bool {
|
||||||
|
le.mutex.RLock()
|
||||||
|
defer le.mutex.RUnlock()
|
||||||
|
return le.isLeader
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLeader returns the current leader ID
|
||||||
|
func (le *LeaderElection) GetLeader() string {
|
||||||
|
le.mutex.RLock()
|
||||||
|
defer le.mutex.RUnlock()
|
||||||
|
return le.currentLeader
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetTerm returns the current leadership term
|
||||||
|
func (le *LeaderElection) GetTerm() uint64 {
|
||||||
|
le.mutex.RLock()
|
||||||
|
defer le.mutex.RUnlock()
|
||||||
|
return le.leaderTerm
|
||||||
|
}
|
||||||
|
|
||||||
|
// electionLoop runs the main election process
|
||||||
|
func (le *LeaderElection) electionLoop() {
|
||||||
|
ticker := time.NewTicker(ElectionTimeout)
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
// Try to become leader immediately
|
||||||
|
le.tryBecomeLeader()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-le.ctx.Done():
|
||||||
|
return
|
||||||
|
case <-ticker.C:
|
||||||
|
// Periodically check if we should try to become leader
|
||||||
|
if !le.IsLeader() && le.shouldTryElection() {
|
||||||
|
le.tryBecomeLeader()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// leaseRenewalLoop renews the leadership lease if we're the leader
|
||||||
|
func (le *LeaderElection) leaseRenewalLoop() {
|
||||||
|
ticker := time.NewTicker(HeartbeatInterval)
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-le.ctx.Done():
|
||||||
|
return
|
||||||
|
case <-ticker.C:
|
||||||
|
if le.IsLeader() {
|
||||||
|
if err := le.renewLease(); err != nil {
|
||||||
|
le.logger.Printf("❌ Failed to renew leadership lease: %v", err)
|
||||||
|
le.loseLeadership()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// monitorLeadership watches for leadership changes
|
||||||
|
func (le *LeaderElection) monitorLeadership() {
|
||||||
|
watcher, err := le.kv.Watch("leader")
|
||||||
|
if err != nil {
|
||||||
|
le.logger.Printf("❌ Failed to watch leadership: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
defer watcher.Stop()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-le.ctx.Done():
|
||||||
|
return
|
||||||
|
case entry := <-watcher.Updates():
|
||||||
|
if entry == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
le.handleLeadershipUpdate(entry)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// tryBecomeLeader attempts to acquire leadership
|
||||||
|
func (le *LeaderElection) tryBecomeLeader() {
|
||||||
|
le.logger.Printf("🗳️ Attempting to become leader")
|
||||||
|
|
||||||
|
now := time.Now()
|
||||||
|
newLease := LeadershipLease{
|
||||||
|
LeaderID: le.nodeID,
|
||||||
|
Term: le.leaderTerm + 1,
|
||||||
|
ExpiresAt: now.Add(LeaderLeaseTimeout),
|
||||||
|
StartedAt: now,
|
||||||
|
}
|
||||||
|
|
||||||
|
leaseData, err := json.Marshal(newLease)
|
||||||
|
if err != nil {
|
||||||
|
le.logger.Printf("❌ Failed to marshal lease: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try to create the leader key (atomic operation)
|
||||||
|
_, err = le.kv.Create("leader", leaseData)
|
||||||
|
if err != nil {
|
||||||
|
// Leader key exists, check if it's expired
|
||||||
|
if le.tryClaimExpiredLease() {
|
||||||
|
return // Successfully claimed expired lease
|
||||||
|
}
|
||||||
|
// Another node is leader
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Successfully became leader!
|
||||||
|
le.becomeLeader(newLease.Term)
|
||||||
|
}
|
||||||
|
|
||||||
|
// tryClaimExpiredLease attempts to claim an expired leadership lease
|
||||||
|
func (le *LeaderElection) tryClaimExpiredLease() bool {
|
||||||
|
entry, err := le.kv.Get("leader")
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
var currentLease LeadershipLease
|
||||||
|
if err := json.Unmarshal(entry.Value(), ¤tLease); err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if lease is expired
|
||||||
|
if time.Now().Before(currentLease.ExpiresAt) {
|
||||||
|
// Lease is still valid
|
||||||
|
le.updateCurrentLeader(currentLease.LeaderID, currentLease.Term)
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Lease is expired, try to claim it
|
||||||
|
le.logger.Printf("🕐 Attempting to claim expired lease from %s", currentLease.LeaderID)
|
||||||
|
|
||||||
|
now := time.Now()
|
||||||
|
newLease := LeadershipLease{
|
||||||
|
LeaderID: le.nodeID,
|
||||||
|
Term: currentLease.Term + 1,
|
||||||
|
ExpiresAt: now.Add(LeaderLeaseTimeout),
|
||||||
|
StartedAt: now,
|
||||||
|
}
|
||||||
|
|
||||||
|
leaseData, err := json.Marshal(newLease)
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Atomically update the lease
|
||||||
|
_, err = le.kv.Update("leader", leaseData, entry.Revision())
|
||||||
|
if err != nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Successfully claimed expired lease!
|
||||||
|
le.becomeLeader(newLease.Term)
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
// renewLease renews the current leadership lease
|
||||||
|
func (le *LeaderElection) renewLease() error {
|
||||||
|
entry, err := le.kv.Get("leader")
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
var currentLease LeadershipLease
|
||||||
|
if err := json.Unmarshal(entry.Value(), ¤tLease); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify we're still the leader
|
||||||
|
if currentLease.LeaderID != le.nodeID {
|
||||||
|
return fmt.Errorf("no longer leader, current leader is %s", currentLease.LeaderID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Renew the lease
|
||||||
|
renewedLease := currentLease
|
||||||
|
renewedLease.ExpiresAt = time.Now().Add(LeaderLeaseTimeout)
|
||||||
|
|
||||||
|
leaseData, err := json.Marshal(renewedLease)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
_, err = le.kv.Update("leader", leaseData, entry.Revision())
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to renew lease: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
le.logger.Printf("💓 Renewed leadership lease until %s", renewedLease.ExpiresAt.Format(time.RFC3339))
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// becomeLeader handles becoming the cluster leader
|
||||||
|
func (le *LeaderElection) becomeLeader(term uint64) {
|
||||||
|
le.mutex.Lock()
|
||||||
|
le.isLeader = true
|
||||||
|
le.currentLeader = le.nodeID
|
||||||
|
le.leaderTerm = term
|
||||||
|
le.mutex.Unlock()
|
||||||
|
|
||||||
|
le.logger.Printf("👑 Became cluster leader (term %d)", term)
|
||||||
|
|
||||||
|
if le.callbacks.OnBecameLeader != nil {
|
||||||
|
le.callbacks.OnBecameLeader()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// loseLeadership handles losing leadership
|
||||||
|
func (le *LeaderElection) loseLeadership() {
|
||||||
|
le.mutex.Lock()
|
||||||
|
wasLeader := le.isLeader
|
||||||
|
le.isLeader = false
|
||||||
|
le.mutex.Unlock()
|
||||||
|
|
||||||
|
if wasLeader {
|
||||||
|
le.logger.Printf("📉 Lost cluster leadership")
|
||||||
|
if le.callbacks.OnLostLeader != nil {
|
||||||
|
le.callbacks.OnLostLeader()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// resignLeadership gracefully resigns from leadership
|
||||||
|
func (le *LeaderElection) resignLeadership() {
|
||||||
|
if !le.IsLeader() {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
le.logger.Printf("👋 Resigning from cluster leadership")
|
||||||
|
|
||||||
|
// Delete the leadership key
|
||||||
|
err := le.kv.Delete("leader")
|
||||||
|
if err != nil {
|
||||||
|
le.logger.Printf("⚠️ Failed to delete leadership key: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
le.loseLeadership()
|
||||||
|
}
|
||||||
|
|
||||||
|
// shouldTryElection determines if this node should attempt to become leader
|
||||||
|
func (le *LeaderElection) shouldTryElection() bool {
|
||||||
|
// Always try if no current leader
|
||||||
|
if le.GetLeader() == "" {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if current lease is expired
|
||||||
|
entry, err := le.kv.Get("leader")
|
||||||
|
if err != nil {
|
||||||
|
// Can't read lease, try to become leader
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
var currentLease LeadershipLease
|
||||||
|
if err := json.Unmarshal(entry.Value(), ¤tLease); err != nil {
|
||||||
|
// Invalid lease, try to become leader
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try if lease is expired
|
||||||
|
return time.Now().After(currentLease.ExpiresAt)
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleLeadershipUpdate processes leadership change notifications
|
||||||
|
func (le *LeaderElection) handleLeadershipUpdate(entry nats.KeyValueEntry) {
|
||||||
|
if entry.Operation() == nats.KeyValueDelete {
|
||||||
|
// Leadership was vacated
|
||||||
|
le.updateCurrentLeader("", 0)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
var lease LeadershipLease
|
||||||
|
if err := json.Unmarshal(entry.Value(), &lease); err != nil {
|
||||||
|
le.logger.Printf("⚠️ Invalid leadership lease: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
le.updateCurrentLeader(lease.LeaderID, lease.Term)
|
||||||
|
}
|
||||||
|
|
||||||
|
// updateCurrentLeader updates the current leader information
|
||||||
|
func (le *LeaderElection) updateCurrentLeader(leaderID string, term uint64) {
|
||||||
|
le.mutex.Lock()
|
||||||
|
oldLeader := le.currentLeader
|
||||||
|
le.currentLeader = leaderID
|
||||||
|
le.leaderTerm = term
|
||||||
|
|
||||||
|
// Update our leadership status
|
||||||
|
if leaderID == le.nodeID {
|
||||||
|
le.isLeader = true
|
||||||
|
} else {
|
||||||
|
if le.isLeader {
|
||||||
|
le.isLeader = false
|
||||||
|
le.mutex.Unlock()
|
||||||
|
if le.callbacks.OnLostLeader != nil {
|
||||||
|
le.callbacks.OnLostLeader()
|
||||||
|
}
|
||||||
|
le.mutex.Lock()
|
||||||
|
} else {
|
||||||
|
le.isLeader = false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
le.mutex.Unlock()
|
||||||
|
|
||||||
|
// Notify of leader change
|
||||||
|
if oldLeader != leaderID && leaderID != "" && leaderID != le.nodeID {
|
||||||
|
le.logger.Printf("🔄 New cluster leader: %s (term %d)", leaderID, term)
|
||||||
|
if le.callbacks.OnNewLeader != nil {
|
||||||
|
le.callbacks.OnNewLeader(leaderID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
331
cluster/manager.go
Normal file
331
cluster/manager.go
Normal file
@@ -0,0 +1,331 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"os"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
)
|
||||||
|
|
||||||
|
// VMRegistry provides access to local VM information for cluster operations
|
||||||
|
type VMRegistry interface {
|
||||||
|
GetActiveVMs() map[string]interface{} // VirtualMachine interface to avoid import cycles
|
||||||
|
GetShard(actorID string) int
|
||||||
|
}
|
||||||
|
|
||||||
|
// ClusterManager coordinates distributed VM operations across the cluster
|
||||||
|
type ClusterManager struct {
|
||||||
|
nodeID string
|
||||||
|
nodes map[string]*NodeInfo
|
||||||
|
nodeUpdates chan NodeUpdate
|
||||||
|
shardMap *ShardMap
|
||||||
|
hashRing *ConsistentHashRing
|
||||||
|
election *LeaderElection
|
||||||
|
natsConn *nats.Conn
|
||||||
|
ctx context.Context
|
||||||
|
mutex sync.RWMutex
|
||||||
|
logger *log.Logger
|
||||||
|
vmRegistry VMRegistry // Interface to access local VMs
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewClusterManager creates a cluster coordination manager
|
||||||
|
func NewClusterManager(nodeID string, natsConn *nats.Conn, ctx context.Context) (*ClusterManager, error) {
|
||||||
|
cm := &ClusterManager{
|
||||||
|
nodeID: nodeID,
|
||||||
|
nodes: make(map[string]*NodeInfo),
|
||||||
|
nodeUpdates: make(chan NodeUpdate, 100),
|
||||||
|
shardMap: &ShardMap{Shards: make(map[int][]string), Nodes: make(map[string]NodeInfo)},
|
||||||
|
hashRing: NewConsistentHashRing(),
|
||||||
|
natsConn: natsConn,
|
||||||
|
ctx: ctx,
|
||||||
|
logger: log.New(os.Stdout, fmt.Sprintf("[ClusterMgr %s] ", nodeID), log.LstdFlags),
|
||||||
|
vmRegistry: nil, // Will be set later via SetVMRegistry
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create leadership election with callbacks
|
||||||
|
callbacks := LeaderElectionCallbacks{
|
||||||
|
OnBecameLeader: func() {
|
||||||
|
cm.logger.Printf("👑 This node became the cluster leader - can initiate rebalancing")
|
||||||
|
},
|
||||||
|
OnLostLeader: func() {
|
||||||
|
cm.logger.Printf("📉 This node lost cluster leadership")
|
||||||
|
},
|
||||||
|
OnNewLeader: func(leaderID string) {
|
||||||
|
cm.logger.Printf("🔄 Cluster leadership changed to: %s", leaderID)
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
election, err := NewLeaderElection(nodeID, natsConn, callbacks)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create leader election: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cm.election = election
|
||||||
|
return cm, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Start begins cluster management operations
|
||||||
|
func (cm *ClusterManager) Start() {
|
||||||
|
cm.logger.Printf("🚀 Starting cluster manager")
|
||||||
|
|
||||||
|
// Start leader election
|
||||||
|
cm.election.Start()
|
||||||
|
|
||||||
|
// Subscribe to cluster messages
|
||||||
|
cm.natsConn.Subscribe("aether.cluster.*", cm.handleClusterMessage)
|
||||||
|
|
||||||
|
// Start node monitoring
|
||||||
|
go cm.monitorNodes()
|
||||||
|
|
||||||
|
// Start shard rebalancing (only if leader)
|
||||||
|
go cm.rebalanceLoop()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop gracefully stops the cluster manager
|
||||||
|
func (cm *ClusterManager) Stop() {
|
||||||
|
cm.logger.Printf("🛑 Stopping cluster manager")
|
||||||
|
|
||||||
|
if cm.election != nil {
|
||||||
|
cm.election.Stop()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// IsLeader returns whether this node is the cluster leader
|
||||||
|
func (cm *ClusterManager) IsLeader() bool {
|
||||||
|
if cm.election == nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return cm.election.IsLeader()
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLeader returns the current cluster leader ID
|
||||||
|
func (cm *ClusterManager) GetLeader() string {
|
||||||
|
if cm.election == nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return cm.election.GetLeader()
|
||||||
|
}
|
||||||
|
|
||||||
|
// SetVMRegistry sets the VM registry for accessing local VM information
|
||||||
|
func (cm *ClusterManager) SetVMRegistry(registry VMRegistry) {
|
||||||
|
cm.vmRegistry = registry
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActorsInShard returns actors that belong to a specific shard on this node
|
||||||
|
func (cm *ClusterManager) GetActorsInShard(shardID int) []string {
|
||||||
|
if cm.vmRegistry == nil {
|
||||||
|
return []string{}
|
||||||
|
}
|
||||||
|
|
||||||
|
activeVMs := cm.vmRegistry.GetActiveVMs()
|
||||||
|
var actors []string
|
||||||
|
|
||||||
|
for actorID := range activeVMs {
|
||||||
|
if cm.vmRegistry.GetShard(actorID) == shardID {
|
||||||
|
actors = append(actors, actorID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return actors
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleClusterMessage processes incoming cluster coordination messages
|
||||||
|
func (cm *ClusterManager) handleClusterMessage(msg *nats.Msg) {
|
||||||
|
var clusterMsg ClusterMessage
|
||||||
|
if err := json.Unmarshal(msg.Data, &clusterMsg); err != nil {
|
||||||
|
cm.logger.Printf("⚠️ Invalid cluster message: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
switch clusterMsg.Type {
|
||||||
|
case "rebalance":
|
||||||
|
cm.handleRebalanceRequest(clusterMsg)
|
||||||
|
case "migrate":
|
||||||
|
cm.handleMigrationRequest(clusterMsg)
|
||||||
|
case "node_update":
|
||||||
|
if update, ok := clusterMsg.Payload.(NodeUpdate); ok {
|
||||||
|
cm.handleNodeUpdate(update)
|
||||||
|
}
|
||||||
|
default:
|
||||||
|
cm.logger.Printf("⚠️ Unknown cluster message type: %s", clusterMsg.Type)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleNodeUpdate processes node status updates
|
||||||
|
func (cm *ClusterManager) handleNodeUpdate(update NodeUpdate) {
|
||||||
|
cm.mutex.Lock()
|
||||||
|
defer cm.mutex.Unlock()
|
||||||
|
|
||||||
|
switch update.Type {
|
||||||
|
case NodeJoined:
|
||||||
|
cm.nodes[update.Node.ID] = update.Node
|
||||||
|
cm.hashRing.AddNode(update.Node.ID)
|
||||||
|
cm.logger.Printf("➕ Node joined: %s", update.Node.ID)
|
||||||
|
|
||||||
|
case NodeLeft:
|
||||||
|
delete(cm.nodes, update.Node.ID)
|
||||||
|
cm.hashRing.RemoveNode(update.Node.ID)
|
||||||
|
cm.logger.Printf("➖ Node left: %s", update.Node.ID)
|
||||||
|
|
||||||
|
case NodeUpdated:
|
||||||
|
if node, exists := cm.nodes[update.Node.ID]; exists {
|
||||||
|
// Update existing node info
|
||||||
|
*node = *update.Node
|
||||||
|
} else {
|
||||||
|
// New node
|
||||||
|
cm.nodes[update.Node.ID] = update.Node
|
||||||
|
cm.hashRing.AddNode(update.Node.ID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check for failed nodes and mark them
|
||||||
|
now := time.Now()
|
||||||
|
for _, node := range cm.nodes {
|
||||||
|
if now.Sub(node.LastSeen) > 90*time.Second && node.Status != NodeStatusFailed {
|
||||||
|
node.Status = NodeStatusFailed
|
||||||
|
cm.logger.Printf("❌ Node marked as failed: %s (last seen: %s)",
|
||||||
|
node.ID, node.LastSeen.Format(time.RFC3339))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Trigger rebalancing if we're the leader and there are significant changes
|
||||||
|
if cm.IsLeader() {
|
||||||
|
activeNodeCount := 0
|
||||||
|
for _, node := range cm.nodes {
|
||||||
|
if node.Status == NodeStatusActive {
|
||||||
|
activeNodeCount++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simple trigger: rebalance if we have different number of active nodes
|
||||||
|
// than shards assigned (this is a simplified logic)
|
||||||
|
if activeNodeCount > 0 {
|
||||||
|
cm.triggerShardRebalancing("node topology changed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleRebalanceRequest processes cluster rebalancing requests
|
||||||
|
func (cm *ClusterManager) handleRebalanceRequest(msg ClusterMessage) {
|
||||||
|
cm.logger.Printf("🔄 Handling rebalance request from %s", msg.From)
|
||||||
|
|
||||||
|
// Implementation would handle the specific rebalancing logic
|
||||||
|
// This is a simplified version
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleMigrationRequest processes actor migration requests
|
||||||
|
func (cm *ClusterManager) handleMigrationRequest(msg ClusterMessage) {
|
||||||
|
cm.logger.Printf("🚚 Handling migration request from %s", msg.From)
|
||||||
|
|
||||||
|
// Implementation would handle the specific migration logic
|
||||||
|
// This is a simplified version
|
||||||
|
}
|
||||||
|
|
||||||
|
// triggerShardRebalancing initiates shard rebalancing across the cluster
|
||||||
|
func (cm *ClusterManager) triggerShardRebalancing(reason string) {
|
||||||
|
if !cm.IsLeader() {
|
||||||
|
return // Only leader can initiate rebalancing
|
||||||
|
}
|
||||||
|
|
||||||
|
cm.logger.Printf("⚖️ Triggering shard rebalancing: %s", reason)
|
||||||
|
|
||||||
|
// Get active nodes
|
||||||
|
var activeNodes []*NodeInfo
|
||||||
|
cm.mutex.RLock()
|
||||||
|
for _, node := range cm.nodes {
|
||||||
|
if node.Status == NodeStatusActive {
|
||||||
|
activeNodes = append(activeNodes, node)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cm.mutex.RUnlock()
|
||||||
|
|
||||||
|
if len(activeNodes) == 0 {
|
||||||
|
cm.logger.Printf("⚠️ No active nodes available for rebalancing")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// This would implement the actual rebalancing logic
|
||||||
|
cm.logger.Printf("🎯 Would rebalance across %d active nodes", len(activeNodes))
|
||||||
|
}
|
||||||
|
|
||||||
|
// monitorNodes periodically checks node health and updates
|
||||||
|
func (cm *ClusterManager) monitorNodes() {
|
||||||
|
ticker := time.NewTicker(30 * time.Second)
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-ticker.C:
|
||||||
|
// Health check logic would go here
|
||||||
|
cm.checkNodeHealth()
|
||||||
|
|
||||||
|
case <-cm.ctx.Done():
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// checkNodeHealth verifies the health of known nodes
|
||||||
|
func (cm *ClusterManager) checkNodeHealth() {
|
||||||
|
cm.mutex.Lock()
|
||||||
|
defer cm.mutex.Unlock()
|
||||||
|
|
||||||
|
now := time.Now()
|
||||||
|
for _, node := range cm.nodes {
|
||||||
|
if now.Sub(node.LastSeen) > 90*time.Second && node.Status == NodeStatusActive {
|
||||||
|
node.Status = NodeStatusFailed
|
||||||
|
cm.logger.Printf("💔 Node failed: %s", node.ID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// rebalanceLoop runs periodic rebalancing checks (leader only)
|
||||||
|
func (cm *ClusterManager) rebalanceLoop() {
|
||||||
|
ticker := time.NewTicker(5 * time.Minute)
|
||||||
|
defer ticker.Stop()
|
||||||
|
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-ticker.C:
|
||||||
|
if cm.IsLeader() {
|
||||||
|
cm.triggerShardRebalancing("periodic rebalance check")
|
||||||
|
}
|
||||||
|
|
||||||
|
case <-cm.ctx.Done():
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetNodes returns a copy of the current cluster nodes
|
||||||
|
func (cm *ClusterManager) GetNodes() map[string]*NodeInfo {
|
||||||
|
cm.mutex.RLock()
|
||||||
|
defer cm.mutex.RUnlock()
|
||||||
|
|
||||||
|
nodes := make(map[string]*NodeInfo)
|
||||||
|
for id, node := range cm.nodes {
|
||||||
|
// Create a copy to prevent external mutation
|
||||||
|
nodeCopy := *node
|
||||||
|
nodes[id] = &nodeCopy
|
||||||
|
}
|
||||||
|
return nodes
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetShardMap returns the current shard mapping
|
||||||
|
func (cm *ClusterManager) GetShardMap() *ShardMap {
|
||||||
|
cm.mutex.RLock()
|
||||||
|
defer cm.mutex.RUnlock()
|
||||||
|
|
||||||
|
// Return a copy to prevent external mutation
|
||||||
|
return &ShardMap{
|
||||||
|
Version: cm.shardMap.Version,
|
||||||
|
Shards: make(map[int][]string),
|
||||||
|
Nodes: make(map[string]NodeInfo),
|
||||||
|
UpdateTime: cm.shardMap.UpdateTime,
|
||||||
|
}
|
||||||
|
}
|
||||||
188
cluster/shard.go
Normal file
188
cluster/shard.go
Normal file
@@ -0,0 +1,188 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/binary"
|
||||||
|
"fmt"
|
||||||
|
"hash"
|
||||||
|
"hash/fnv"
|
||||||
|
)
|
||||||
|
|
||||||
|
// MigrationStatus tracks actor migration progress
|
||||||
|
type MigrationStatus string
|
||||||
|
|
||||||
|
const (
|
||||||
|
MigrationPending MigrationStatus = "pending"
|
||||||
|
MigrationInProgress MigrationStatus = "in_progress"
|
||||||
|
MigrationCompleted MigrationStatus = "completed"
|
||||||
|
MigrationFailed MigrationStatus = "failed"
|
||||||
|
)
|
||||||
|
|
||||||
|
// PlacementStrategy determines where to place new actors
|
||||||
|
type PlacementStrategy interface {
|
||||||
|
PlaceActor(actorID string, shardMap *ShardMap, nodes map[string]*NodeInfo) (string, error)
|
||||||
|
RebalanceShards(shardMap *ShardMap, nodes map[string]*NodeInfo) (*ShardMap, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ShardManager handles actor placement and distribution
|
||||||
|
type ShardManager struct {
|
||||||
|
shardCount int
|
||||||
|
shardMap *ShardMap
|
||||||
|
hasher hash.Hash
|
||||||
|
placement PlacementStrategy
|
||||||
|
replication int
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewShardManager creates a new shard manager
|
||||||
|
func NewShardManager(shardCount, replication int) *ShardManager {
|
||||||
|
return &ShardManager{
|
||||||
|
shardCount: shardCount,
|
||||||
|
shardMap: &ShardMap{Shards: make(map[int][]string), Nodes: make(map[string]NodeInfo)},
|
||||||
|
hasher: fnv.New64a(),
|
||||||
|
placement: &ConsistentHashPlacement{},
|
||||||
|
replication: replication,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetShard returns the shard number for a given actor ID
|
||||||
|
func (sm *ShardManager) GetShard(actorID string) int {
|
||||||
|
h := sha256.Sum256([]byte(actorID))
|
||||||
|
shardID := binary.BigEndian.Uint32(h[:4]) % uint32(sm.shardCount)
|
||||||
|
return int(shardID)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetShardNodes returns the nodes responsible for a shard
|
||||||
|
func (sm *ShardManager) GetShardNodes(shardID int) []string {
|
||||||
|
if nodes, exists := sm.shardMap.Shards[shardID]; exists {
|
||||||
|
return nodes
|
||||||
|
}
|
||||||
|
return []string{}
|
||||||
|
}
|
||||||
|
|
||||||
|
// AssignShard assigns a shard to specific nodes
|
||||||
|
func (sm *ShardManager) AssignShard(shardID int, nodes []string) {
|
||||||
|
if sm.shardMap.Shards == nil {
|
||||||
|
sm.shardMap.Shards = make(map[int][]string)
|
||||||
|
}
|
||||||
|
sm.shardMap.Shards[shardID] = nodes
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetPrimaryNode returns the primary node for a shard
|
||||||
|
func (sm *ShardManager) GetPrimaryNode(shardID int) string {
|
||||||
|
nodes := sm.GetShardNodes(shardID)
|
||||||
|
if len(nodes) > 0 {
|
||||||
|
return nodes[0] // First node is primary
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetReplicaNodes returns the replica nodes for a shard
|
||||||
|
func (sm *ShardManager) GetReplicaNodes(shardID int) []string {
|
||||||
|
nodes := sm.GetShardNodes(shardID)
|
||||||
|
if len(nodes) > 1 {
|
||||||
|
return nodes[1:] // All nodes except first are replicas
|
||||||
|
}
|
||||||
|
return []string{}
|
||||||
|
}
|
||||||
|
|
||||||
|
// UpdateShardMap updates the entire shard map
|
||||||
|
func (sm *ShardManager) UpdateShardMap(newShardMap *ShardMap) {
|
||||||
|
sm.shardMap = newShardMap
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetShardMap returns a copy of the current shard map
|
||||||
|
func (sm *ShardManager) GetShardMap() *ShardMap {
|
||||||
|
// Return a deep copy to prevent external mutation
|
||||||
|
copy := &ShardMap{
|
||||||
|
Version: sm.shardMap.Version,
|
||||||
|
Shards: make(map[int][]string),
|
||||||
|
Nodes: make(map[string]NodeInfo),
|
||||||
|
UpdateTime: sm.shardMap.UpdateTime,
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy the shard assignments
|
||||||
|
for shardID, nodes := range sm.shardMap.Shards {
|
||||||
|
copy.Shards[shardID] = append([]string(nil), nodes...)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy the node info
|
||||||
|
for nodeID, nodeInfo := range sm.shardMap.Nodes {
|
||||||
|
copy.Nodes[nodeID] = nodeInfo
|
||||||
|
}
|
||||||
|
|
||||||
|
return copy
|
||||||
|
}
|
||||||
|
|
||||||
|
// RebalanceShards redistributes shards across available nodes
|
||||||
|
func (sm *ShardManager) RebalanceShards(nodes map[string]*NodeInfo) (*ShardMap, error) {
|
||||||
|
if sm.placement == nil {
|
||||||
|
return nil, fmt.Errorf("no placement strategy configured")
|
||||||
|
}
|
||||||
|
|
||||||
|
return sm.placement.RebalanceShards(sm.shardMap, nodes)
|
||||||
|
}
|
||||||
|
|
||||||
|
// PlaceActor determines which node should handle a new actor
|
||||||
|
func (sm *ShardManager) PlaceActor(actorID string, nodes map[string]*NodeInfo) (string, error) {
|
||||||
|
if sm.placement == nil {
|
||||||
|
return "", fmt.Errorf("no placement strategy configured")
|
||||||
|
}
|
||||||
|
|
||||||
|
return sm.placement.PlaceActor(actorID, sm.shardMap, nodes)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetActorsInShard returns actors that belong to a specific shard on a specific node
|
||||||
|
func (sm *ShardManager) GetActorsInShard(shardID int, nodeID string, vmRegistry VMRegistry) []string {
|
||||||
|
if vmRegistry == nil {
|
||||||
|
return []string{}
|
||||||
|
}
|
||||||
|
|
||||||
|
activeVMs := vmRegistry.GetActiveVMs()
|
||||||
|
var actors []string
|
||||||
|
|
||||||
|
for actorID := range activeVMs {
|
||||||
|
if sm.GetShard(actorID) == shardID {
|
||||||
|
actors = append(actors, actorID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return actors
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// ConsistentHashPlacement implements PlacementStrategy using consistent hashing
|
||||||
|
type ConsistentHashPlacement struct{}
|
||||||
|
|
||||||
|
// PlaceActor places an actor using consistent hashing
|
||||||
|
func (chp *ConsistentHashPlacement) PlaceActor(actorID string, shardMap *ShardMap, nodes map[string]*NodeInfo) (string, error) {
|
||||||
|
if len(nodes) == 0 {
|
||||||
|
return "", fmt.Errorf("no nodes available for placement")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Simple consistent hash placement - in a real implementation,
|
||||||
|
// this would use the consistent hash ring
|
||||||
|
h := sha256.Sum256([]byte(actorID))
|
||||||
|
nodeIndex := binary.BigEndian.Uint32(h[:4]) % uint32(len(nodes))
|
||||||
|
|
||||||
|
i := 0
|
||||||
|
for nodeID := range nodes {
|
||||||
|
if i == int(nodeIndex) {
|
||||||
|
return nodeID, nil
|
||||||
|
}
|
||||||
|
i++
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fallback to first node
|
||||||
|
for nodeID := range nodes {
|
||||||
|
return nodeID, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return "", fmt.Errorf("failed to place actor")
|
||||||
|
}
|
||||||
|
|
||||||
|
// RebalanceShards rebalances shards across nodes
|
||||||
|
func (chp *ConsistentHashPlacement) RebalanceShards(currentMap *ShardMap, nodes map[string]*NodeInfo) (*ShardMap, error) {
|
||||||
|
// This is a simplified implementation
|
||||||
|
// In practice, this would implement sophisticated rebalancing logic
|
||||||
|
return currentMap, nil
|
||||||
|
}
|
||||||
110
cluster/types.go
Normal file
110
cluster/types.go
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
package cluster
|
||||||
|
|
||||||
|
import (
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
// NumShards defines the total number of shards in the cluster
|
||||||
|
NumShards = 1024
|
||||||
|
// VirtualNodes defines the number of virtual nodes per physical node for consistent hashing
|
||||||
|
VirtualNodes = 150
|
||||||
|
// Leadership election constants
|
||||||
|
LeaderLeaseTimeout = 10 * time.Second // How long a leader lease lasts
|
||||||
|
HeartbeatInterval = 3 * time.Second // How often leader sends heartbeats
|
||||||
|
ElectionTimeout = 2 * time.Second // How long to wait for election
|
||||||
|
)
|
||||||
|
|
||||||
|
// NodeStatus represents the health status of a node
|
||||||
|
type NodeStatus string
|
||||||
|
|
||||||
|
const (
|
||||||
|
NodeStatusActive NodeStatus = "active"
|
||||||
|
NodeStatusDraining NodeStatus = "draining"
|
||||||
|
NodeStatusFailed NodeStatus = "failed"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NodeInfo represents information about a cluster node
|
||||||
|
type NodeInfo struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Address string `json:"address"`
|
||||||
|
Port int `json:"port"`
|
||||||
|
Status NodeStatus `json:"status"`
|
||||||
|
Capacity float64 `json:"capacity"` // Maximum load capacity
|
||||||
|
Load float64 `json:"load"` // Current CPU/memory load
|
||||||
|
LastSeen time.Time `json:"lastSeen"` // Last heartbeat timestamp
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
Metadata map[string]string `json:"metadata"`
|
||||||
|
IsLeader bool `json:"isLeader"`
|
||||||
|
VMCount int `json:"vmCount"` // Number of VMs on this node
|
||||||
|
ShardIDs []int `json:"shardIds"` // Shards assigned to this node
|
||||||
|
}
|
||||||
|
|
||||||
|
// NodeUpdateType represents the type of node update
|
||||||
|
type NodeUpdateType string
|
||||||
|
|
||||||
|
const (
|
||||||
|
NodeJoined NodeUpdateType = "joined"
|
||||||
|
NodeLeft NodeUpdateType = "left"
|
||||||
|
NodeUpdated NodeUpdateType = "updated"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NodeUpdate represents a node status update
|
||||||
|
type NodeUpdate struct {
|
||||||
|
Type NodeUpdateType `json:"type"`
|
||||||
|
Node *NodeInfo `json:"node"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ShardMap represents the distribution of shards across cluster nodes
|
||||||
|
type ShardMap struct {
|
||||||
|
Version uint64 `json:"version"` // Incremented on each change
|
||||||
|
Shards map[int][]string `json:"shards"` // shard ID -> [primary, replica1, replica2]
|
||||||
|
Nodes map[string]NodeInfo `json:"nodes"` // node ID -> node info
|
||||||
|
UpdateTime time.Time `json:"updateTime"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ClusterMessage represents inter-node communication
|
||||||
|
type ClusterMessage struct {
|
||||||
|
Type string `json:"type"`
|
||||||
|
From string `json:"from"`
|
||||||
|
To string `json:"to"`
|
||||||
|
Payload interface{} `json:"payload"`
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// RebalanceRequest represents a request to rebalance shards
|
||||||
|
type RebalanceRequest struct {
|
||||||
|
RequestID string `json:"requestId"`
|
||||||
|
FromNode string `json:"fromNode"`
|
||||||
|
ToNode string `json:"toNode"`
|
||||||
|
ShardIDs []int `json:"shardIds"`
|
||||||
|
Reason string `json:"reason"`
|
||||||
|
Migrations []ActorMigration `json:"migrations"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ActorMigration represents the migration of an actor between nodes
|
||||||
|
type ActorMigration struct {
|
||||||
|
ActorID string `json:"actorId"`
|
||||||
|
FromNode string `json:"fromNode"`
|
||||||
|
ToNode string `json:"toNode"`
|
||||||
|
ShardID int `json:"shardId"`
|
||||||
|
State map[string]interface{} `json:"state"`
|
||||||
|
Version int64 `json:"version"`
|
||||||
|
Status string `json:"status"` // "pending", "in_progress", "completed", "failed"
|
||||||
|
}
|
||||||
|
|
||||||
|
// LeaderElectionCallbacks defines callbacks for leadership changes
|
||||||
|
type LeaderElectionCallbacks struct {
|
||||||
|
OnBecameLeader func()
|
||||||
|
OnLostLeader func()
|
||||||
|
OnNewLeader func(leaderID string)
|
||||||
|
}
|
||||||
|
|
||||||
|
// LeadershipLease represents a leadership lease in the cluster
|
||||||
|
type LeadershipLease struct {
|
||||||
|
LeaderID string `json:"leaderId"`
|
||||||
|
Term uint64 `json:"term"`
|
||||||
|
ExpiresAt time.Time `json:"expiresAt"`
|
||||||
|
StartedAt time.Time `json:"startedAt"`
|
||||||
|
}
|
||||||
|
|
||||||
38
event.go
Normal file
38
event.go
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
package aether
|
||||||
|
|
||||||
|
import (
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Event represents a domain event in the system
|
||||||
|
type Event struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
EventType string `json:"eventType"`
|
||||||
|
ActorID string `json:"actorId"`
|
||||||
|
CommandID string `json:"commandId,omitempty"` // Correlation ID for command that triggered this event
|
||||||
|
Version int64 `json:"version"`
|
||||||
|
Data map[string]interface{} `json:"data"`
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ActorSnapshot represents a point-in-time state snapshot
|
||||||
|
type ActorSnapshot struct {
|
||||||
|
ActorID string `json:"actorId"`
|
||||||
|
Version int64 `json:"version"`
|
||||||
|
State map[string]interface{} `json:"state"`
|
||||||
|
Timestamp time.Time `json:"timestamp"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// EventStore defines the interface for event persistence
|
||||||
|
type EventStore interface {
|
||||||
|
SaveEvent(event *Event) error
|
||||||
|
GetEvents(actorID string, fromVersion int64) ([]*Event, error)
|
||||||
|
GetLatestVersion(actorID string) (int64, error)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SnapshotStore extends EventStore with snapshot capabilities
|
||||||
|
type SnapshotStore interface {
|
||||||
|
EventStore
|
||||||
|
GetLatestSnapshot(actorID string) (*ActorSnapshot, error)
|
||||||
|
SaveSnapshot(snapshot *ActorSnapshot) error
|
||||||
|
}
|
||||||
106
eventbus.go
Normal file
106
eventbus.go
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
package aether
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"sync"
|
||||||
|
)
|
||||||
|
|
||||||
|
// EventBroadcaster defines the interface for publishing and subscribing to events
|
||||||
|
type EventBroadcaster interface {
|
||||||
|
Subscribe(namespaceID string) <-chan *Event
|
||||||
|
Unsubscribe(namespaceID string, ch <-chan *Event)
|
||||||
|
Publish(namespaceID string, event *Event)
|
||||||
|
Stop()
|
||||||
|
SubscriberCount(namespaceID string) int
|
||||||
|
}
|
||||||
|
|
||||||
|
// EventBus broadcasts events to multiple subscribers within a namespace
|
||||||
|
type EventBus struct {
|
||||||
|
subscribers map[string][]chan *Event // namespaceID -> channels
|
||||||
|
mutex sync.RWMutex
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewEventBus creates a new event bus
|
||||||
|
func NewEventBus() *EventBus {
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
return &EventBus{
|
||||||
|
subscribers: make(map[string][]chan *Event),
|
||||||
|
ctx: ctx,
|
||||||
|
cancel: cancel,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Subscribe creates a new subscription channel for a namespace
|
||||||
|
func (eb *EventBus) Subscribe(namespaceID string) <-chan *Event {
|
||||||
|
eb.mutex.Lock()
|
||||||
|
defer eb.mutex.Unlock()
|
||||||
|
|
||||||
|
// Create buffered channel to prevent blocking publishers
|
||||||
|
ch := make(chan *Event, 100)
|
||||||
|
eb.subscribers[namespaceID] = append(eb.subscribers[namespaceID], ch)
|
||||||
|
|
||||||
|
return ch
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unsubscribe removes a subscription channel
|
||||||
|
func (eb *EventBus) Unsubscribe(namespaceID string, ch <-chan *Event) {
|
||||||
|
eb.mutex.Lock()
|
||||||
|
defer eb.mutex.Unlock()
|
||||||
|
|
||||||
|
subs := eb.subscribers[namespaceID]
|
||||||
|
for i, subscriber := range subs {
|
||||||
|
if subscriber == ch {
|
||||||
|
// Remove channel from slice
|
||||||
|
eb.subscribers[namespaceID] = append(subs[:i], subs[i+1:]...)
|
||||||
|
close(subscriber)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean up empty namespace entries
|
||||||
|
if len(eb.subscribers[namespaceID]) == 0 {
|
||||||
|
delete(eb.subscribers, namespaceID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Publish sends an event to all subscribers of a namespace
|
||||||
|
func (eb *EventBus) Publish(namespaceID string, event *Event) {
|
||||||
|
eb.mutex.RLock()
|
||||||
|
defer eb.mutex.RUnlock()
|
||||||
|
|
||||||
|
subscribers := eb.subscribers[namespaceID]
|
||||||
|
for _, ch := range subscribers {
|
||||||
|
select {
|
||||||
|
case ch <- event:
|
||||||
|
// Event delivered
|
||||||
|
default:
|
||||||
|
// Channel full, skip this subscriber (non-blocking)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop closes the event bus
|
||||||
|
func (eb *EventBus) Stop() {
|
||||||
|
eb.mutex.Lock()
|
||||||
|
defer eb.mutex.Unlock()
|
||||||
|
|
||||||
|
eb.cancel()
|
||||||
|
|
||||||
|
// Close all subscriber channels
|
||||||
|
for _, subs := range eb.subscribers {
|
||||||
|
for _, ch := range subs {
|
||||||
|
close(ch)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
eb.subscribers = make(map[string][]chan *Event)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SubscriberCount returns the number of subscribers for a namespace
|
||||||
|
func (eb *EventBus) SubscriberCount(namespaceID string) int {
|
||||||
|
eb.mutex.RLock()
|
||||||
|
defer eb.mutex.RUnlock()
|
||||||
|
return len(eb.subscribers[namespaceID])
|
||||||
|
}
|
||||||
16
go.mod
Normal file
16
go.mod
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
module git.flowmade.one/flowmade-one/aether
|
||||||
|
|
||||||
|
go 1.23
|
||||||
|
|
||||||
|
require (
|
||||||
|
github.com/google/uuid v1.6.0
|
||||||
|
github.com/nats-io/nats.go v1.37.0
|
||||||
|
)
|
||||||
|
|
||||||
|
require (
|
||||||
|
github.com/klauspost/compress v1.17.2 // indirect
|
||||||
|
github.com/nats-io/nkeys v0.4.7 // indirect
|
||||||
|
github.com/nats-io/nuid v1.0.1 // indirect
|
||||||
|
golang.org/x/crypto v0.18.0 // indirect
|
||||||
|
golang.org/x/sys v0.16.0 // indirect
|
||||||
|
)
|
||||||
14
go.sum
Normal file
14
go.sum
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
|
||||||
|
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
||||||
|
github.com/klauspost/compress v1.17.2 h1:RlWWUY/Dr4fL8qk9YG7DTZ7PDgME2V4csBXA8L/ixi4=
|
||||||
|
github.com/klauspost/compress v1.17.2/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
|
||||||
|
github.com/nats-io/nats.go v1.37.0 h1:07rauXbVnnJvv1gfIyghFEo6lUcYRY0WXc3x7x0vUxE=
|
||||||
|
github.com/nats-io/nats.go v1.37.0/go.mod h1:Ubdu4Nh9exXdSz0RVWRFBbRfrbSxOYd26oF0wkWclB8=
|
||||||
|
github.com/nats-io/nkeys v0.4.7 h1:RwNJbbIdYCoClSDNY7QVKZlyb/wfT6ugvFCiKy6vDvI=
|
||||||
|
github.com/nats-io/nkeys v0.4.7/go.mod h1:kqXRgRDPlGy7nGaEDMuYzmiJCIAAWDK0IMBtDmGD0nc=
|
||||||
|
github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
|
||||||
|
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
|
||||||
|
golang.org/x/crypto v0.18.0 h1:PGVlW0xEltQnzFZ55hkuX5+KLyrMYhHld1YHO4AKcdc=
|
||||||
|
golang.org/x/crypto v0.18.0/go.mod h1:R0j02AL6hcrfOiy9T4ZYp/rcWeMxM3L6QYxlOuEG1mg=
|
||||||
|
golang.org/x/sys v0.16.0 h1:xWw16ngr6ZMtmxDyKyIgsE93KNKz5HKmMa3b8ALHidU=
|
||||||
|
golang.org/x/sys v0.16.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
||||||
47
model/model.go
Normal file
47
model/model.go
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
package model
|
||||||
|
|
||||||
|
// EventStorming model types
|
||||||
|
|
||||||
|
// Model represents an event storming model
|
||||||
|
type Model struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Name string `json:"name"`
|
||||||
|
Events []DomainEvent `json:"events"`
|
||||||
|
Commands []Command `json:"commands"`
|
||||||
|
Aggregates []Aggregate `json:"aggregates"`
|
||||||
|
Processes []BusinessProcess `json:"processes"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// DomainEvent represents a domain event definition
|
||||||
|
type DomainEvent struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Name string `json:"name"`
|
||||||
|
Description string `json:"description"`
|
||||||
|
Data map[string]string `json:"data"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Command represents a command definition
|
||||||
|
type Command struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Name string `json:"name"`
|
||||||
|
Actor string `json:"actor"`
|
||||||
|
TriggersEvent string `json:"triggersEvent"`
|
||||||
|
Data map[string]string `json:"data"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// Aggregate represents an aggregate definition
|
||||||
|
type Aggregate struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Name string `json:"name"`
|
||||||
|
Events []string `json:"events"`
|
||||||
|
Commands []string `json:"commands"`
|
||||||
|
Invariants []string `json:"invariants"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// BusinessProcess represents a business process definition
|
||||||
|
type BusinessProcess struct {
|
||||||
|
ID string `json:"id"`
|
||||||
|
Name string `json:"name"`
|
||||||
|
TriggerEvents []string `json:"triggerEvents"`
|
||||||
|
OutputCommands []string `json:"outputCommands"`
|
||||||
|
}
|
||||||
159
nats_eventbus.go
Normal file
159
nats_eventbus.go
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
package aether
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"sync"
|
||||||
|
|
||||||
|
"github.com/google/uuid"
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NATSEventBus is an EventBus that broadcasts events across all cluster nodes using NATS
|
||||||
|
type NATSEventBus struct {
|
||||||
|
*EventBus // Embed base EventBus for local subscriptions
|
||||||
|
nc *nats.Conn // NATS connection
|
||||||
|
subscriptions []*nats.Subscription
|
||||||
|
namespaceSubscribers map[string]int // Track number of subscribers per namespace
|
||||||
|
nodeID string // Unique ID for this node
|
||||||
|
mutex sync.Mutex
|
||||||
|
ctx context.Context
|
||||||
|
cancel context.CancelFunc
|
||||||
|
}
|
||||||
|
|
||||||
|
// eventMessage is the wire format for events sent over NATS
|
||||||
|
type eventMessage struct {
|
||||||
|
NodeID string `json:"node_id"`
|
||||||
|
NamespaceID string `json:"namespace_id"`
|
||||||
|
Event *Event `json:"event"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewNATSEventBus creates a new NATS-backed event bus
|
||||||
|
func NewNATSEventBus(nc *nats.Conn) (*NATSEventBus, error) {
|
||||||
|
ctx, cancel := context.WithCancel(context.Background())
|
||||||
|
|
||||||
|
neb := &NATSEventBus{
|
||||||
|
EventBus: NewEventBus(),
|
||||||
|
nc: nc,
|
||||||
|
nodeID: uuid.New().String(),
|
||||||
|
subscriptions: make([]*nats.Subscription, 0),
|
||||||
|
namespaceSubscribers: make(map[string]int),
|
||||||
|
ctx: ctx,
|
||||||
|
cancel: cancel,
|
||||||
|
}
|
||||||
|
|
||||||
|
return neb, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Subscribe creates a local subscription and ensures NATS subscription exists for the namespace
|
||||||
|
func (neb *NATSEventBus) Subscribe(namespaceID string) <-chan *Event {
|
||||||
|
neb.mutex.Lock()
|
||||||
|
defer neb.mutex.Unlock()
|
||||||
|
|
||||||
|
// Create local subscription first
|
||||||
|
ch := neb.EventBus.Subscribe(namespaceID)
|
||||||
|
|
||||||
|
// Check if this is the first subscriber for this namespace
|
||||||
|
count := neb.namespaceSubscribers[namespaceID]
|
||||||
|
if count == 0 {
|
||||||
|
// First subscriber - create NATS subscription
|
||||||
|
subject := fmt.Sprintf("aether.events.%s", namespaceID)
|
||||||
|
|
||||||
|
sub, err := neb.nc.Subscribe(subject, func(msg *nats.Msg) {
|
||||||
|
neb.handleNATSEvent(msg)
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("[NATSEventBus] Failed to subscribe to NATS subject %s: %v", subject, err)
|
||||||
|
} else {
|
||||||
|
neb.subscriptions = append(neb.subscriptions, sub)
|
||||||
|
log.Printf("[NATSEventBus] Node %s subscribed to %s", neb.nodeID, subject)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
neb.namespaceSubscribers[namespaceID] = count + 1
|
||||||
|
|
||||||
|
return ch
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unsubscribe removes a local subscription and cleans up NATS subscription if no more subscribers
|
||||||
|
func (neb *NATSEventBus) Unsubscribe(namespaceID string, ch <-chan *Event) {
|
||||||
|
neb.mutex.Lock()
|
||||||
|
defer neb.mutex.Unlock()
|
||||||
|
|
||||||
|
neb.EventBus.Unsubscribe(namespaceID, ch)
|
||||||
|
|
||||||
|
count := neb.namespaceSubscribers[namespaceID]
|
||||||
|
if count > 0 {
|
||||||
|
count--
|
||||||
|
neb.namespaceSubscribers[namespaceID] = count
|
||||||
|
|
||||||
|
if count == 0 {
|
||||||
|
delete(neb.namespaceSubscribers, namespaceID)
|
||||||
|
log.Printf("[NATSEventBus] No more subscribers for namespace %s on node %s", namespaceID, neb.nodeID)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// handleNATSEvent processes events received from NATS
|
||||||
|
func (neb *NATSEventBus) handleNATSEvent(msg *nats.Msg) {
|
||||||
|
var eventMsg eventMessage
|
||||||
|
if err := json.Unmarshal(msg.Data, &eventMsg); err != nil {
|
||||||
|
log.Printf("[NATSEventBus] Failed to unmarshal event: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Skip events that originated from this node (already delivered locally)
|
||||||
|
if eventMsg.NodeID == neb.nodeID {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// Forward to local EventBus subscribers
|
||||||
|
neb.EventBus.Publish(eventMsg.NamespaceID, eventMsg.Event)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Publish publishes an event both locally and to NATS for cross-node broadcasting
|
||||||
|
func (neb *NATSEventBus) Publish(namespaceID string, event *Event) {
|
||||||
|
// First publish locally
|
||||||
|
neb.EventBus.Publish(namespaceID, event)
|
||||||
|
|
||||||
|
// Then publish to NATS for other nodes
|
||||||
|
subject := fmt.Sprintf("aether.events.%s", namespaceID)
|
||||||
|
|
||||||
|
eventMsg := eventMessage{
|
||||||
|
NodeID: neb.nodeID,
|
||||||
|
NamespaceID: namespaceID,
|
||||||
|
Event: event,
|
||||||
|
}
|
||||||
|
|
||||||
|
data, err := json.Marshal(eventMsg)
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("[NATSEventBus] Failed to marshal event for NATS: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := neb.nc.Publish(subject, data); err != nil {
|
||||||
|
log.Printf("[NATSEventBus] Failed to publish event to NATS: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stop closes the NATS event bus and all subscriptions
|
||||||
|
func (neb *NATSEventBus) Stop() {
|
||||||
|
neb.mutex.Lock()
|
||||||
|
defer neb.mutex.Unlock()
|
||||||
|
|
||||||
|
neb.cancel()
|
||||||
|
|
||||||
|
for _, sub := range neb.subscriptions {
|
||||||
|
if err := sub.Unsubscribe(); err != nil {
|
||||||
|
log.Printf("[NATSEventBus] Error unsubscribing: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
neb.subscriptions = nil
|
||||||
|
|
||||||
|
neb.EventBus.Stop()
|
||||||
|
|
||||||
|
log.Printf("[NATSEventBus] Node %s stopped", neb.nodeID)
|
||||||
|
}
|
||||||
218
store/jetstream.go
Normal file
218
store/jetstream.go
Normal file
@@ -0,0 +1,218 @@
|
|||||||
|
package store
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"git.flowmade.one/flowmade-one/aether"
|
||||||
|
"github.com/nats-io/nats.go"
|
||||||
|
)
|
||||||
|
|
||||||
|
// JetStreamEventStore implements EventStore using NATS JetStream for persistence
|
||||||
|
type JetStreamEventStore struct {
|
||||||
|
js nats.JetStreamContext
|
||||||
|
streamName string
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewJetStreamEventStore creates a new JetStream-based event store
|
||||||
|
func NewJetStreamEventStore(natsConn *nats.Conn, streamName string) (*JetStreamEventStore, error) {
|
||||||
|
js, err := natsConn.JetStream()
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to get JetStream context: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create or update the stream
|
||||||
|
stream := &nats.StreamConfig{
|
||||||
|
Name: streamName,
|
||||||
|
Subjects: []string{fmt.Sprintf("%s.events.>", streamName), fmt.Sprintf("%s.snapshots.>", streamName)},
|
||||||
|
Storage: nats.FileStorage,
|
||||||
|
Retention: nats.LimitsPolicy,
|
||||||
|
MaxAge: 365 * 24 * time.Hour, // Keep events for 1 year
|
||||||
|
Replicas: 1, // Can be increased for HA
|
||||||
|
}
|
||||||
|
|
||||||
|
_, err = js.AddStream(stream)
|
||||||
|
if err != nil && !strings.Contains(err.Error(), "already exists") {
|
||||||
|
return nil, fmt.Errorf("failed to create stream: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return &JetStreamEventStore{
|
||||||
|
js: js,
|
||||||
|
streamName: streamName,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveEvent persists an event to JetStream
|
||||||
|
func (jes *JetStreamEventStore) SaveEvent(event *aether.Event) error {
|
||||||
|
// Serialize event to JSON
|
||||||
|
data, err := json.Marshal(event)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal event: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create subject: stream.events.actorType.actorID
|
||||||
|
subject := fmt.Sprintf("%s.events.%s.%s",
|
||||||
|
jes.streamName,
|
||||||
|
sanitizeSubject(extractActorType(event.ActorID)),
|
||||||
|
sanitizeSubject(event.ActorID))
|
||||||
|
|
||||||
|
// Publish with event ID as message ID for deduplication
|
||||||
|
_, err = jes.js.Publish(subject, data, nats.MsgId(event.ID))
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to publish event to JetStream: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetEvents retrieves all events for an actor since a version
|
||||||
|
func (jes *JetStreamEventStore) GetEvents(actorID string, fromVersion int64) ([]*aether.Event, error) {
|
||||||
|
// Create subject filter for this actor
|
||||||
|
subject := fmt.Sprintf("%s.events.%s.%s",
|
||||||
|
jes.streamName,
|
||||||
|
sanitizeSubject(extractActorType(actorID)),
|
||||||
|
sanitizeSubject(actorID))
|
||||||
|
|
||||||
|
// Create consumer to read events
|
||||||
|
consumer, err := jes.js.PullSubscribe(subject, "")
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create consumer: %w", err)
|
||||||
|
}
|
||||||
|
defer consumer.Unsubscribe()
|
||||||
|
|
||||||
|
var events []*aether.Event
|
||||||
|
|
||||||
|
// Fetch messages in batches
|
||||||
|
for {
|
||||||
|
msgs, err := consumer.Fetch(100, nats.MaxWait(time.Second))
|
||||||
|
if err != nil {
|
||||||
|
if err == nats.ErrTimeout {
|
||||||
|
break // No more messages
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("failed to fetch messages: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, msg := range msgs {
|
||||||
|
var event aether.Event
|
||||||
|
if err := json.Unmarshal(msg.Data, &event); err != nil {
|
||||||
|
continue // Skip malformed events
|
||||||
|
}
|
||||||
|
|
||||||
|
// Filter by version
|
||||||
|
if event.Version > fromVersion {
|
||||||
|
events = append(events, &event)
|
||||||
|
}
|
||||||
|
|
||||||
|
msg.Ack()
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(msgs) < 100 {
|
||||||
|
break // No more messages
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return events, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLatestVersion returns the latest version for an actor
|
||||||
|
func (jes *JetStreamEventStore) GetLatestVersion(actorID string) (int64, error) {
|
||||||
|
events, err := jes.GetEvents(actorID, 0)
|
||||||
|
if err != nil {
|
||||||
|
return 0, err
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(events) == 0 {
|
||||||
|
return 0, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
latestVersion := int64(0)
|
||||||
|
for _, event := range events {
|
||||||
|
if event.Version > latestVersion {
|
||||||
|
latestVersion = event.Version
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return latestVersion, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLatestSnapshot gets the most recent snapshot for an actor
|
||||||
|
func (jes *JetStreamEventStore) GetLatestSnapshot(actorID string) (*aether.ActorSnapshot, error) {
|
||||||
|
// Create subject for snapshots
|
||||||
|
subject := fmt.Sprintf("%s.snapshots.%s.%s",
|
||||||
|
jes.streamName,
|
||||||
|
sanitizeSubject(extractActorType(actorID)),
|
||||||
|
sanitizeSubject(actorID))
|
||||||
|
|
||||||
|
// Try to get the latest snapshot
|
||||||
|
consumer, err := jes.js.PullSubscribe(subject, "", nats.DeliverLast())
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create snapshot consumer: %w", err)
|
||||||
|
}
|
||||||
|
defer consumer.Unsubscribe()
|
||||||
|
|
||||||
|
msgs, err := consumer.Fetch(1, nats.MaxWait(time.Second))
|
||||||
|
if err != nil {
|
||||||
|
if err == nats.ErrTimeout {
|
||||||
|
return nil, fmt.Errorf("no snapshot found for actor %s", actorID)
|
||||||
|
}
|
||||||
|
return nil, fmt.Errorf("failed to fetch snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(msgs) == 0 {
|
||||||
|
return nil, fmt.Errorf("no snapshot found for actor %s", actorID)
|
||||||
|
}
|
||||||
|
|
||||||
|
var snapshot aether.ActorSnapshot
|
||||||
|
if err := json.Unmarshal(msgs[0].Data, &snapshot); err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to unmarshal snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
msgs[0].Ack()
|
||||||
|
return &snapshot, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveSnapshot saves a snapshot of actor state
|
||||||
|
func (jes *JetStreamEventStore) SaveSnapshot(snapshot *aether.ActorSnapshot) error {
|
||||||
|
// Serialize snapshot to JSON
|
||||||
|
data, err := json.Marshal(snapshot)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to marshal snapshot: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create subject for snapshots
|
||||||
|
subject := fmt.Sprintf("%s.snapshots.%s.%s",
|
||||||
|
jes.streamName,
|
||||||
|
sanitizeSubject(extractActorType(snapshot.ActorID)),
|
||||||
|
sanitizeSubject(snapshot.ActorID))
|
||||||
|
|
||||||
|
// Publish snapshot
|
||||||
|
_, err = jes.js.Publish(subject, data)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("failed to publish snapshot to JetStream: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Helper functions
|
||||||
|
|
||||||
|
// extractActorType extracts the actor type from an actor ID
|
||||||
|
func extractActorType(actorID string) string {
|
||||||
|
for i, c := range actorID {
|
||||||
|
if c == '-' && i > 0 {
|
||||||
|
return actorID[:i]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
|
||||||
|
// sanitizeSubject sanitizes a string for use in NATS subjects
|
||||||
|
func sanitizeSubject(s string) string {
|
||||||
|
s = strings.ReplaceAll(s, " ", "_")
|
||||||
|
s = strings.ReplaceAll(s, ".", "_")
|
||||||
|
s = strings.ReplaceAll(s, "*", "_")
|
||||||
|
s = strings.ReplaceAll(s, ">", "_")
|
||||||
|
return s
|
||||||
|
}
|
||||||
60
store/memory.go
Normal file
60
store/memory.go
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
package store
|
||||||
|
|
||||||
|
import (
|
||||||
|
"git.flowmade.one/flowmade-one/aether"
|
||||||
|
)
|
||||||
|
|
||||||
|
// InMemoryEventStore provides a simple in-memory event store for testing
|
||||||
|
type InMemoryEventStore struct {
|
||||||
|
events map[string][]*aether.Event // actorID -> events
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewInMemoryEventStore creates a new in-memory event store
|
||||||
|
func NewInMemoryEventStore() *InMemoryEventStore {
|
||||||
|
return &InMemoryEventStore{
|
||||||
|
events: make(map[string][]*aether.Event),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// SaveEvent saves an event to the in-memory store
|
||||||
|
func (es *InMemoryEventStore) SaveEvent(event *aether.Event) error {
|
||||||
|
if _, exists := es.events[event.ActorID]; !exists {
|
||||||
|
es.events[event.ActorID] = make([]*aether.Event, 0)
|
||||||
|
}
|
||||||
|
es.events[event.ActorID] = append(es.events[event.ActorID], event)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetEvents retrieves events for an actor from a specific version
|
||||||
|
func (es *InMemoryEventStore) GetEvents(actorID string, fromVersion int64) ([]*aether.Event, error) {
|
||||||
|
events, exists := es.events[actorID]
|
||||||
|
if !exists {
|
||||||
|
return []*aether.Event{}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
var filteredEvents []*aether.Event
|
||||||
|
for _, event := range events {
|
||||||
|
if event.Version >= fromVersion {
|
||||||
|
filteredEvents = append(filteredEvents, event)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return filteredEvents, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GetLatestVersion returns the latest version for an actor
|
||||||
|
func (es *InMemoryEventStore) GetLatestVersion(actorID string) (int64, error) {
|
||||||
|
events, exists := es.events[actorID]
|
||||||
|
if !exists || len(events) == 0 {
|
||||||
|
return 0, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
latestVersion := int64(0)
|
||||||
|
for _, event := range events {
|
||||||
|
if event.Version > latestVersion {
|
||||||
|
latestVersion = event.Version
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return latestVersion, nil
|
||||||
|
}
|
||||||
37
vision.md
Normal file
37
vision.md
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
# Aether Vision
|
||||||
|
|
||||||
|
Distributed actor system with event sourcing for Go, powered by NATS.
|
||||||
|
|
||||||
|
## Organization Context
|
||||||
|
|
||||||
|
This repo is part of Flowmade. See [organization manifesto](../architecture/manifesto.md) for who we are and what we believe.
|
||||||
|
|
||||||
|
## What This Is
|
||||||
|
|
||||||
|
Aether is an open-source infrastructure library for building distributed, event-sourced systems in Go. It provides:
|
||||||
|
|
||||||
|
- **Event sourcing primitives** - Event, EventStore interface, snapshots
|
||||||
|
- **Event stores** - In-memory (testing) and JetStream (production)
|
||||||
|
- **Event bus** - Local and NATS-backed pub/sub with namespace isolation
|
||||||
|
- **Cluster management** - Node discovery, leader election, shard distribution
|
||||||
|
- **Namespace isolation** - Logical boundaries for multi-scope deployments
|
||||||
|
|
||||||
|
## Who This Serves
|
||||||
|
|
||||||
|
- **Go developers** building distributed systems
|
||||||
|
- **Teams** implementing event sourcing and CQRS patterns
|
||||||
|
- **Projects** needing actor-based concurrency with event persistence
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
1. **Simple event sourcing** - Clear primitives that compose well
|
||||||
|
2. **NATS-native** - Built for JetStream, not bolted on
|
||||||
|
3. **Horizontal scaling** - Consistent hashing, shard migration, leader election
|
||||||
|
4. **Namespace isolation** - Logical boundaries without infrastructure overhead
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- Opinionated multi-tenancy (product layer concern)
|
||||||
|
- Domain-specific abstractions (use the primitives)
|
||||||
|
- GraphQL/REST API generation (build on top)
|
||||||
|
- UI components (see iris)
|
||||||
Reference in New Issue
Block a user