The chunk name follow
<segment_name>.E-<epoch>-O-<start_offset>.<GUID>
- Gather all chunks matching pattern <segment_name>*
- Sort them by epoch and offset information
- Remove zombie chunks using approach similar to this https://github.com/pravega/pravega/blob/master/segmentstore/storage/src/main/java/io/pravega/segmentstore/storage/chunklayer/SystemJournal.java#L1058
- Using length of the chunks from LTS and start offset in the name figure out missing chunks.
- Find sub sequence that is complete.
- Even in the simplest scenario , the start of the first chunk does not match actual start of segment. (it could be anywhere in the first chunk)
- Presence of chunks that are scheduled for GC throws off the recovery procedure. We don't necessarily know start offset
- The biggest issue here that the "complete subsequence we recovred may be entirely deleted.
- Recover each segment (main and as well as any transactions)
- Currently we defragment inline but this may not be the case in future
- There might be open unmerged or aborted transactions
Given the system journal depend on sequentially created journals, they can not tolerate any gaps. Therefore the the state of system segments must be entirely recreated from the LTS.