[Issue #39] Handle malformed events during JetStream replay with proper error reporting #41

Merged
HugoNijhuis merged 3 commits from issue-39-malformed-events into main 2026-01-10 17:48:05 +00:00
Owner

Closes #39

Summary

  • Add ReplayError type to capture malformed event details (sequence number, raw data, underlying error)
  • Add ReplayResult type to return both successfully parsed events and any errors encountered
  • Add EventStoreWithErrors interface for stores that can report replay errors
  • Implement GetEventsWithErrors on JetStreamEventStore to give callers visibility into data quality
  • Update GetEvents to maintain backward compatibility (still skips malformed events silently)
  • Add comprehensive unit tests for the new types

Impact

This addresses the silent data loss issue identified in the JetStream store. Callers can now:

  1. Use GetEvents for backward-compatible behavior (silent skip)
  2. Use GetEventsWithErrors to receive information about any malformed events
  3. Decide how to handle corrupted data (log, alert, retry, etc.)

Test plan

  • Unit tests for ReplayError type
  • Unit tests for ReplayResult type
  • Compile-time interface check for JetStreamEventStore
  • All existing tests pass

Generated with Claude Code

Closes #39 ## Summary - Add `ReplayError` type to capture malformed event details (sequence number, raw data, underlying error) - Add `ReplayResult` type to return both successfully parsed events and any errors encountered - Add `EventStoreWithErrors` interface for stores that can report replay errors - Implement `GetEventsWithErrors` on `JetStreamEventStore` to give callers visibility into data quality - Update `GetEvents` to maintain backward compatibility (still skips malformed events silently) - Add comprehensive unit tests for the new types ## Impact This addresses the silent data loss issue identified in the JetStream store. Callers can now: 1. Use `GetEvents` for backward-compatible behavior (silent skip) 2. Use `GetEventsWithErrors` to receive information about any malformed events 3. Decide how to handle corrupted data (log, alert, retry, etc.) ## Test plan - [x] Unit tests for `ReplayError` type - [x] Unit tests for `ReplayResult` type - [x] Compile-time interface check for `JetStreamEventStore` - [x] All existing tests pass Generated with [Claude Code](https://claude.com/claude-code)
HugoNijhuis added 1 commit 2026-01-10 14:33:08 +00:00
Handle malformed events during JetStream replay with proper error reporting
All checks were successful
CI / build (pull_request) Successful in 17s
b630258f60
Add ReplayError and ReplayResult types to capture information about
malformed events encountered during replay. This allows callers to
inspect and handle corrupted data rather than having it silently skipped.

Key changes:
- Add ReplayError type with sequence number, raw data, and underlying error
- Add ReplayResult type containing both successfully parsed events and errors
- Add EventStoreWithErrors interface for stores that can report replay errors
- Implement GetEventsWithErrors on JetStreamEventStore
- Update GetEvents to maintain backward compatibility (still skips malformed)
- Add comprehensive unit tests for the new types

This addresses the issue of silent data loss during event-sourced replay
by giving callers visibility into data quality issues.

Closes #39

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Author
Owner

Code Review - PR #41

Status: Approved

Summary

This PR introduces proper error handling for malformed events during JetStream replay, addressing Issue #39. The implementation is clean, well-documented, and maintains backward compatibility.

Strengths

1. Well-designed API

  • ReplayError type captures all necessary context (sequence number, raw data, underlying error)
  • ReplayResult provides a clean way to return both events and errors
  • EventStoreWithErrors interface allows type-safe detection of stores that support error reporting
  • Compile-time interface check ensures JetStreamEventStore implements the interface

2. Backward Compatibility

  • GetEvents() continues to work as before (silently skips malformed events)
  • New GetEventsWithErrors() provides opt-in visibility into data quality issues
  • No breaking changes to existing API

3. Good Error Handling Pattern

  • Uses Go idioms: Error() and Unwrap() methods
  • Preserves raw data for debugging/recovery
  • Correctly acks malformed messages to prevent redelivery loops

4. Comprehensive Tests

  • Unit tests for ReplayError and ReplayResult types
  • Edge cases covered (zero sequence, large data, nil errors)
  • All existing tests pass

Minor Observations (non-blocking)

  1. Consider adding a helper method to ReplayResult like TotalCount() or SuccessRate() for monitoring use cases
  2. The RawData field stores full message content - for very large corrupted messages this could use memory. A future enhancement might truncate or hash for logging purposes.

Verdict

Solid implementation that solves the silent data loss problem. Ready to merge.

## Code Review - PR #41 **Status: Approved** ### Summary This PR introduces proper error handling for malformed events during JetStream replay, addressing Issue #39. The implementation is clean, well-documented, and maintains backward compatibility. ### Strengths **1. Well-designed API** - `ReplayError` type captures all necessary context (sequence number, raw data, underlying error) - `ReplayResult` provides a clean way to return both events and errors - `EventStoreWithErrors` interface allows type-safe detection of stores that support error reporting - Compile-time interface check ensures JetStreamEventStore implements the interface **2. Backward Compatibility** - `GetEvents()` continues to work as before (silently skips malformed events) - New `GetEventsWithErrors()` provides opt-in visibility into data quality issues - No breaking changes to existing API **3. Good Error Handling Pattern** - Uses Go idioms: `Error()` and `Unwrap()` methods - Preserves raw data for debugging/recovery - Correctly acks malformed messages to prevent redelivery loops **4. Comprehensive Tests** - Unit tests for `ReplayError` and `ReplayResult` types - Edge cases covered (zero sequence, large data, nil errors) - All existing tests pass ### Minor Observations (non-blocking) 1. Consider adding a helper method to `ReplayResult` like `TotalCount()` or `SuccessRate()` for monitoring use cases 2. The `RawData` field stores full message content - for very large corrupted messages this could use memory. A future enhancement might truncate or hash for logging purposes. ### Verdict Solid implementation that solves the silent data loss problem. Ready to merge.
HugoNijhuis force-pushed issue-39-malformed-events from b630258f60 to e77a3a9868 2026-01-10 17:47:49 +00:00 Compare
HugoNijhuis merged commit 484e3ced2e into main 2026-01-10 17:48:05 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: flowmade-one/aether#41