Skip to content

Instantly share code, notes, and snippets.

@de-sh
Last active March 14, 2025 20:51
Show Gist options
  • Save de-sh/6648759b53bce7a7e21efd3319e100bb to your computer and use it in GitHub Desktop.
Save de-sh/6648759b53bce7a7e21efd3319e100bb to your computer and use it in GitHub Desktop.
Data Loss Due to Scheduler Time Skew

Problem Statement

Parseable is experiencing data loss due to a time skew in local-sync. The scheduler operates on minute-based(eventually configurable) intervals and performs flush operations at the end of each interval. However, a timing discrepancy causes late-arriving data to be appended after EOF markers, rendering this data kinda invisible to readers.

Current System Behavior

The scheduler triggers a flush operation at the end of each minute interval

The flush operation:

  1. Writes all in-memory buffered data to disk
  2. Appends an EOF marker to signal completion

Due to time skew, additional data from the current minute continues to arrive after the EOF marker has been written and gets appended(ref). Reader processes stop reading when they encounter the EOF marker, ignoring any subsequent data.

Root Cause Analysis

The fundamental issue is a race condition between:

  1. Time skew introduced by scheduler
  2. The actual ingestion timeline

This skew creates a scenario where the scheduler prematurely signals completion (via EOF) while logs belonging to the same time window continue to arrive, this is mainly because we don't care to check if the file being flushed is from the current minute or not(ref).

Impact

Data loss: All records written after the EOF marker are effectively lost. Potential for silent failures as the system appears to function correctly after dataloss

Proposed Solutions

Don't flush/finish if data contained is of current minute, as solved by is_current

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment