fix(store): Implement version cache invalidation strategy for JetStreamEventStore #130
Reference in New Issue
Block a user
Delete Branch "issue-126-untitled"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements version cache invalidation strategy in JetStreamEventStore to ensure consistency when external processes write to the same JetStream stream.
Changes
Implementation Strategy
The solution uses a hybrid approach that satisfies all acceptance criteria:
How It Works
Testing
Closes #126
Code Review: Changes Requested
Thanks for the PR! I found critical issues in the cache invalidation logic.
CRITICAL BUG: Cache not repopulated after invalidation (lines 327-337)
In GetLatestVersion(), when cache is detected as stale, you delete it but never update it with the fresh data just fetched. The cache should be repopulated immediately:
Current code deletes cache: delete(jes.versions[actorID])
Should instead: jes.versions[actorID] = latestVersion
This defeats the cache purpose - next SaveEvent will miss cache again.
RACE CONDITION: Unsynchronized cache update
GetLatestVersion fetches data without lock (line 311), then updates cache with lock (line 330). Between these, SaveEvent could run and update cache, then GetLatestVersion overwrites it with stale data. Subsequent SaveEvent could accept duplicate versions.
TEST INCOMPLETE
Test doesn't verify cache was repopulated. Add second GetLatestVersion call to confirm caching works after invalidation.
Strengths:
These issues must be fixed before merge.
Code Review: Changes Requested
Thanks for the PR! I've found a critical logic issue in the cache invalidation strategy that needs to be addressed.
Issues to fix:
Cache invalidation logic is incomplete in GetLatestVersion() - CRITICAL BUG
File: store/jetstream.go, lines 327-337
Current behavior: When GetLatestVersion detects a stale cache entry (actual > cached), it deletes the cache entry
Problem: The cache entry is deleted but NOT repopulated with the fresh data just fetched
Impact: After cache invalidation, the cache remains empty. On the next SaveEvent call, getLatestVersionLocked will miss the cache and have to fetch from JetStream again. This defeats the purpose of caching and the performance benefit for the single-writer case is lost.
Fix: After computing latestVersion in GetLatestVersion(), always update the cache with the fresh value, not just delete on mismatch. The cache should reflect what we just fetched:
Race condition between GetLatestVersion and SaveEvent
GetLatestVersion does not hold the lock while fetching from JetStream (line 311: GetEvents called outside lock). SaveEvent holds the lock for its entire operation (line 150).
Race scenario: GetLatestVersion reads version=2, gets preempted, SaveEvent runs and writes version=3, then GetLatestVersion resumes and overwrites cache with stale version=2. Subsequent SaveEvent could accept version=3 again, violating version monotonicity.
Fix: Either (a) hold lock while fetching fresh data in GetLatestVersion, OR (b) accept that GetLatestVersion shouldn't modify the cache that SaveEvent relies on, making it read-only.
Test doesn't verify cache behavior after invalidation
TestJetStreamEventStore_CacheInvalidationOnExternalWrite (lines 1328-1427)
The test verifies detection but doesn't verify the cache was repopulated for efficiency. Add a second GetLatestVersion call to confirm cache is properly updated.
Acceptance criteria status:
Strengths:
Please address these issues and I'll re-review.
Code Review: Approved ✓
Excellent work addressing all the feedback! All three critical issues have been resolved correctly.
Issues Fixed:
Cache repopulation (FIXED) - GetLatestVersion now updates the cache with fresh data (line 335:
jes.versions[actorID] = latestVersion) instead of just deleting it. This ensures the cache is immediately usable after detecting an external write, maintaining performance for the single-writer case.Race condition (FIXED) - GetLatestVersion now acquires the mutex lock (line 311-312) and holds it for the entire duration of the fetch operation, preventing SaveEvent from running between the fetch and cache update. This eliminates the risk of SaveEvent updating cache while GetLatestVersion is mid-operation.
Test completeness (FIXED) - The test now includes a second GetLatestVersion call (lines 1393-1400) that verifies the cache was properly repopulated after the invalidation. This confirms efficiency is maintained after external writes.
Quality Assessment:
Design Correctness:
The implementation correctly handles all requirements:
Ready to merge!
Code Review: Approved ✓
Excellent work addressing the feedback! All three critical issues have been resolved correctly.
Issues Fixed:
Cache repopulation (FIXED) - GetLatestVersion now updates the cache with fresh data (line 335:
jes.versions[actorID] = latestVersion) instead of just deleting it. This ensures the cache is immediately usable after detecting an external write, maintaining performance for the single-writer case.Race condition (FIXED) - GetLatestVersion now acquires the mutex lock (line 311-312) and holds it for the entire duration of the fetch operation, preventing SaveEvent from running between the fetch and cache update. This eliminates the risk of SaveEvent updating cache while GetLatestVersion is mid-operation.
Test completeness (FIXED) - The test now includes a second GetLatestVersion call (lines 1393-1400) that verifies the cache was properly repopulated after the invalidation. This confirms efficiency is maintained after external writes.
Quality Assessment:
Design Notes:
The implementation correctly handles the design requirements:
Ready to merge!
Co-Authored-By: Claude Code noreply@anthropic.com