docs: Add VersionConflictError retry pattern examples

Add comprehensive examples demonstrating standard retry patterns for handling version conflicts during optimistic concurrency control: - Pattern 1: Simple exponential backoff (recommended for most cases) - Pattern 2: State reload and merge (deterministic, idempotent updates) - Pattern 3: Circuit breaker (cascading failure prevention) - Pattern 4: Jittered backoff (thundering herd prevention) - Pattern 5: Conflict analysis and monitoring Includes complete, runnable examples and a guide to choosing the right pattern for different scenarios. Documents best practices for monitoring and debugging version conflicts. Closes #62 Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-01-13 21:26:05 +01:00
parent bcbec9ab94
commit f16a7c6237
2 changed files with 591 additions and 0 deletions
--- a/examples/README.md
+++ b/examples/README.md
@@ -0,0 +1,235 @@
+# Aether Examples
+
+Standard patterns and best practices for building with Aether.
+
+## Version Conflict Retry Patterns
+
+When using optimistic concurrency control with Aether's event store, version conflicts can occur when multiple writers attempt to save events for the same actor. The `VersionConflictError` provides full context about the conflict, enabling intelligent retry strategies.
+
+### Understanding Version Conflicts
+
+A version conflict occurs when:
+- You attempt to save an event with version `N`
+- But the actor already has a version >= `N`
+
+Example:
+```go
+// Actor "order-123" currently has version 5
+// Writer A reads version 5, creates version 6, saves successfully
+// Writer B also read version 5, creates version 6, attempts save
+// -> VersionConflictError: current=6, attempted=6
+```
+
+### Working with VersionConflictError
+
+The `VersionConflictError` provides:
+- `ActorID` - The actor that had the conflict
+- `CurrentVersion` - The actual current version in the store
+- `AttemptedVersion` - The version you tried to save
+
+Example usage:
+```go
+err := eventStore.SaveEvent(event)
+if errors.Is(err, aether.ErrVersionConflict) {
+    var versionErr *aether.VersionConflictError
+    if errors.As(err, &versionErr) {
+        fmt.Printf("Conflict for actor %q: current=%d, attempted=%d",
+            versionErr.ActorID, versionErr.CurrentVersion, versionErr.AttemptedVersion)
+        // Implement retry logic using CurrentVersion
+        nextVersion := versionErr.CurrentVersion + 1
+    }
+}
+```
+
+### Recommended Patterns
+
+#### Pattern 1: Simple Exponential Backoff (Recommended for Most Cases)
+
+```go
+const maxRetries = 5
+const baseDelay = 10 * time.Millisecond
+
+for attempt := 0; attempt < maxRetries; attempt++ {
+    currentVersion, _ := eventStore.GetLatestVersion(actorID)
+    
+    event := &aether.Event{
+        ActorID: actorID,
+        Version: currentVersion + 1,
+        // ...
+    }
+    
+    err := eventStore.SaveEvent(event)
+    if err == nil {
+        return nil  // Success!
+    }
+    
+    if !errors.Is(err, aether.ErrVersionConflict) {
+        return err  // Different error, don't retry
+    }
+    
+    // Exponential backoff: 10ms, 20ms, 40ms, 80ms, 160ms
+    delay := time.Duration(baseDelay.Milliseconds() * int64(math.Pow(2, float64(attempt)))) * time.Millisecond
+    time.Sleep(delay)
+}
+return fmt.Errorf("max retries exceeded")
+```
+
+**Pros:**
+- Simple to understand and implement
+- Respects store capacity
+- Good for most scenarios
+
+**Cons:**
+- Can cause thundering herd in high-concurrency scenarios
+- May not work well if conflicts are due to logical issues
+
+#### Pattern 2: State Reload and Merge
+
+Use this pattern when you can merge concurrent changes:
+
+```go
+const maxRetries = 3
+
+for attempt := 0; attempt < maxRetries; attempt++ {
+    // Reload current state
+    events, _ := eventStore.GetEvents(actorID, 0)
+    aggregate := rebuildFromEvents(events)
+    
+    // Apply your update
+    aggregate.Status = "shipped"
+    
+    // Attempt save with new version
+    event := &aether.Event{
+        ActorID: actorID,
+        Version: aggregate.Version + 1,
+        Data: map[string]interface{}{"status": aggregate.Status},
+    }
+    
+    err := eventStore.SaveEvent(event)
+    if err == nil {
+        return nil  // Success!
+    }
+    
+    if !errors.Is(err, aether.ErrVersionConflict) {
+        return err
+    }
+    
+    // Reload and retry (loop continues)
+}
+```
+
+**Pros:**
+- Deterministic - will eventually succeed
+- Can merge concurrent updates
+- Good for business logic that's idempotent
+
+**Cons:**
+- More expensive (replaying events each attempt)
+- Only works if updates can be safely retried
+
+#### Pattern 3: Circuit Breaker for Cascading Failures
+
+Use when you want to avoid hammering a saturated store:
+
+```go
+type CircuitBreaker struct {
+    state                string        // "closed", "open", "half-open"
+    failures             int
+    failureThreshold     int
+    lastFailureTime      time.Time
+    cooldownTime         time.Duration
+}
+
+// ... implement circuit breaker logic ...
+
+// Usage:
+if !cb.canAttempt() {
+    return fmt.Errorf("circuit breaker open")
+}
+
+err := eventStore.SaveEvent(event)
+if err == nil {
+    cb.recordSuccess()
+} else if errors.Is(err, aether.ErrVersionConflict) {
+    cb.recordFailure()
+    if cb.failureCount >= cb.failureThreshold {
+        cb.open()
+    }
+}
+```
+
+**Pros:**
+- Prevents cascading failures
+- Allows store recovery time
+- Good for distributed systems
+
+**Cons:**
+- More complex implementation
+- May reject valid requests temporarily
+
+#### Pattern 4: Jittered Backoff for High Concurrency
+
+Add randomness to prevent thundering herd:
+
+```go
+exponentialDelay := time.Duration(baseDelay.Milliseconds() * int64(math.Pow(2, float64(attempt)))) * time.Millisecond
+jitter := time.Duration(rand.Int63n(int64(exponentialDelay)))
+delay := exponentialDelay + jitter
+time.Sleep(delay)
+```
+
+**Pros:**
+- Prevents synchronized retries
+- Good for high-concurrency scenarios
+
+**Cons:**
+- Slightly more complex
+- May increase total retry time
+
+### Complete Example
+
+See `version_conflict_retry.go` for complete, runnable examples of all patterns.
+
+### When to Use Each Pattern
+
+| Pattern | Use When | Avoid When |
+|---------|----------|-----------|
+| Exponential Backoff | Default choice for most apps | Store is consistently overloaded |
+| State Reload | Updates can be safely replayed | Event replay is expensive |
+| Circuit Breaker | Store is frequently saturated | You need immediate feedback |
+| Jittered Backoff | Many concurrent writers | Single-threaded app |
+
+### Monitoring Version Conflicts
+
+Log and monitor version conflicts to understand contention patterns:
+
+```go
+var versionErr *aether.VersionConflictError
+if errors.As(err, &versionErr) {
+    log.WithFields(log.Fields{
+        "actor_id": versionErr.ActorID,
+        "current_version": versionErr.CurrentVersion,
+        "attempted_version": versionErr.AttemptedVersion,
+        "version_gap": versionErr.AttemptedVersion - versionErr.CurrentVersion,
+    }).Warn("Version conflict")
+    
+    // Alert if gap is too large (indicates stale read)
+    if versionErr.AttemptedVersion - versionErr.CurrentVersion > 5 {
+        metrics.versionConflictLargeGap.Inc()
+    }
+}
+```
+
+### Best Practices
+
+1. **Always check the error type** - Not all errors are version conflicts
+2. **Use CurrentVersion for retries** - Don't hardcode retry logic
+3. **Set reasonable retry limits** - Prevent infinite loops
+4. **Monitor contention** - Track version conflicts to identify hotspots
+5. **Consider your domain** - Some updates can be safely retried, others cannot
+6. **Test concurrent scenarios** - Version conflicts are rare in single-threaded apps
+
+### References
+
+- [CLAUDE.md](../CLAUDE.md) - Architecture and event versioning semantics
+- [Event Sourcing Patterns](../vision.md) - Domain-driven design approach