Splitting and Reassembling HLS/DASH Audio and Video

You might have experienced the following: you are watching a video in your browser and audio and video both work fine, but when you pass the .m3u8 URL that you found in the page's source to a download tool, the download tool makes a veritable pig's breakfast out of it -- perhaps the video works but the audio doesn't, perhaps the audio is completely desynchronized from the video, perhaps the audio appears to interrupt the video.

The likely culprit is the encoding of each video fragment. .m3u8 is actually a playlist -- a list of individual files named fragments that should be played in sequence as if they were one big video. Normally, a fragment would start with some metadata (M) and then carry video (V) and audio (A) intermingled:

MMVVVAAVVVAAVVVAA...

However, in this case, the file has been encoded to first contain all video frames and then all audio chunks:

MMVVVVVVVVV...AAAAAA...

Our goal is to convert the video from the latter form to the former, making it play back successfully everywhere.

Tools

You will require the following tools:

yt-dlp to take the .m3u8 URL and download the fragments
TSDuck to analyze and pick apart the fragments
FFmpeg to merge it all back together

Step 1: Obtain the Fragments

Run yt-dlp with the --keep-fragments option to keep the fragments:

yt-dlp --keep-fragments https://videos.example.com/video_2973/4404/4404_hls.m3u8

yt-dlp might still try merging the file into 4404_hls [4404_hls].mp4 once it's finished, but you should also find all the fragments in the same directory, named 4404_hls [4404_hls].mp4.part-Frag1 with the final number increasing.

Step 2: Analyze a Fragment

Ask tsp from the TSDuck suite to analyze one of the fragments:

tsp \
    -I file "4404_hls [4404_hls].mp4.part-Frag1" \
    -P analyze \
    -O drop

(-O drop ensures that no MPEG data is output, since we're only interested in the textual analysis results.)

The interesting part is the Services Analysis Report:

===============================================================================
|  SERVICES ANALYSIS REPORT                                                   |
|=============================================================================|
|  Global PID's                                                               |
|  TS packets: 1, PID's: 1 (clear: 1, scrambled: 0)                           |
|-----------------------------------------------------------------------------|
|     PID  Usage                                     Access          Bitrate  |
|   Total  Global PID's ................................. C          137 b/s  |
|   Subt.  Global PSI/SI PID's (0x00-0x1F) .............. C          137 b/s  |
|  0x0000  PAT .......................................... C          137 b/s  |
|=============================================================================|
|  Service: 0x0001 (1), TS: 0x0001 (1)                                        |
|  Service name: (unknown), provider: (unknown)                               |
|  Service type: 0x00 (Undefined)                                             |
|  TS packets: 4,816, PID's: 3 (clear: 3, scrambled: 0)                       |
|  PMT PID: 0x01E0 (480), PCR PID: 0x01E1 (481)                               |
|-----------------------------------------------------------------------------|
|     PID  Usage                                     Access          Bitrate  |
|   Total  Undefined .................................... C      659,385 b/s  |
|  0x01E0  PMT .......................................... C          137 b/s  |
|  0x01E1  AVC video (1920x1080, high profile, level 4.0  C      503,301 b/s  |
|  0x01E2  MPEG-2 AAC Audio (eng) ....................... C      155,947 b/s  |
|          (C=Clear, S=Scrambled, +=Shared)                                   |
===============================================================================

The following PIDs are relevant to us:

in the global PIDs section, the PID 0x0000 (used for PAT, which is metadata)
in the section for Service 0x0001, the PIDs 0x01E0 (used for PMT, which is metadata), 0x01E1 (video) and 0x01E2 (audio)

We can now use tsdump from the TSDuck suite to see if we really have "all the video followed by all the audio" encoding:

tsdump --header "4404_hls [4404_hls].mp4.part-Frag1" > dump.txt

In the resulting dump.txt file, we will see blocks of packet headers looking like this:

* Packet 0
  ---- TS Header ----
  PID: 0x0000 (0), header size: 4, sync: 0x47
  Error: 0, unit start: 1, priority: 0
  Scrambling: 0, continuity counter: 0
  Adaptation field: no (0 bytes), payload: yes (184 bytes)

The file will start with PIDs 0x0000 and 0x01E0 (metadata). Then, if it's actually in the order described previously, it will contain lots of packets with PID 0x01E1 (video) followed by lots of packets with PID 0x01E2 (audio) instead of those two being intermingled.

The good news is that even though the video is split into multiple fragments, the PIDs remain the same across the fragments (otherwise, playback in the browser probably stops working as well), so we only need to analyze one fragment.

Step 3: Extract Video and Audio from Each Fragment

We will now use tsp to split the fragments: one will contain metadata and video, the other will contain metadata and audio.

This example assumes that the video has 25 fragments; of course; you should check how many fragments your video has.

As per the information gleaned from the above analysis, we will need:

PIDs 0x0000, 0x01E0 and 0x01E1 for the video file
PIDs 0x0000, 0x01E0 and 0x01E2 for the audio file

for fragnum in {1..25}
do
    tsp \
        -I file "4404_hls [4404_hls].mp4.part-Frag${fragnum}" \
        -P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E1 \
        -O file "part${fragnum}.m4v"

    tsp \
        -I file "4404_hls [4404_hls].mp4.part-Frag${fragnum}" \
        -P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E2 \
        -O file "part${fragnum}.m4a"
done

We now have 25 .m4v files containing only the video of each fragment and 25 .m4a files containing only the audio.

Step 4: Make One Long Video and One Long Audio

Now, we can get ffmpeg to concatenate the video fragments into a complete video file and the same for audio.

To use ffmpeg's concat mode, it requires a file with inputs. We assemble this before calling ffmpeg:

# clean up after a previous run
rm -f "audios.txt" "videos.txt"
for fragnum in {1..25}
do
    echo "file 'part${fragnum}.m4a'" >> "audios.txt"
    echo "file 'part${fragnum}.m4v'" >> "videos.txt"
done

ffmpeg -f concat -i audios.txt -c copy 4404.m4a
ffmpeg -f concat -i videos.txt -c copy 4404.m4v

We now have two files: 4404.m4a with the full audio and 4404.m4v with the full video.

Step 5: Merging Video and Audio

Finally, we can unify our two files into one stream with both video and audio:

ffmpeg -i 4404.m4v -i 4404.m4a -c copy 4404.mp4

And that's it -- 4404.mp4 contains both video and audio.

But I'm on Windows

Good news: all three required tools are available for Windows as well -- and if you use PowerShell, loops should remind you a lot of C, C++, C#, Java, JavaScript and the like:

For ($fragnum = 1; $fragnum -le 25; $fragnum += 1) {
    & tsp `
        -I file "4404_hls [4404_hls].mp4.part-Frag$($fragnum)" `
        -P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E1 `
        -O file "part$($fragnum).m4v"

    & tsp `
        -I file "4404_hls [4404_hls].mp4.part-Frag$($fragnum)" `
        -P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E2 `
        -O file "part$($fragnum).m4a"
}

Transforming the other loops into PowerShell is left as an exercise to the reader. ;-)

RavuAlHemio/split-reassemble-hls-dash.md