Restore state from Long Term Storage

Naming Convention

The chunk name follow <segment_name>.E-<epoch>-O-<start_offset>.<GUID>

Use case - simple segment

Recovery

Gather all chunks matching pattern <segment_name>*
Sort them by epoch and offset information
Remove zombie chunks using approach similar to this https://github.com/pravega/pravega/blob/master/segmentstore/storage/src/main/java/io/pravega/segmentstore/storage/chunklayer/SystemJournal.java#L1058
Using length of the chunks from LTS and start offset in the name figure out missing chunks.
Find sub sequence that is complete.

Problems

Do not know exact start offset

Even in the simplest scenario , the start of the first chunk does not match actual start of segment. (it could be anywhere in the first chunk)

Dealing with GC

Presence of chunks that are scheduled for GC throws off the recovery procedure. We don't necessarily know start offset
The biggest issue here that the "complete subsequence we recovred may be entirely deleted.

Use case - simple segment with transaction

Recovery

Recover each segment (main and as well as any transactions)

Problems

Do not know exact sequence in which transactions are merged.

Currently we defragment inline but this may not be the case in future
There might be open unmerged or aborted transactions

Do not know offset at which new transactions are merged.

Use case - Replaying system journal

Recovery

Given the system journal depend on sequentially created journals, they can not tolerate any gaps. Therefore the the state of system segments must be entirely recreated from the LTS.

sachin-j-joshi/Restore-state-from-Long-Term-Storage.md

Naming Convention

Use case - simple segment

Recovery

Problems

Do not know exact start offset

Dealing with GC

Use case - simple segment with transaction

Recovery

Problems

Do not know exact sequence in which transactions are merged.

Do not know offset at which new transactions are merged.

Use case - Replaying system journal

Recovery