You might have experienced the following: you are watching a video in your browser and audio and video both work fine, but when you pass the .m3u8
URL that you found in the page's source to a download tool, the download tool makes a veritable pig's breakfast out of it -- perhaps the video works but the audio doesn't, perhaps the audio is completely desynchronized from the video, perhaps the audio appears to interrupt the video.
The likely culprit is the encoding of each video fragment. .m3u8
is actually a playlist -- a list of individual files named fragments that should be played in sequence as if they were one big video. Normally, a fragment would start with some metadata (M
) and then carry video (V
) and audio (A
) intermingled:
MMVVVAAVVVAAVVVAA...
However, in this case, the file has been encoded to first contain all video frames and then all audio chunks:
MMVVVVVVVVV...AAAAAA...
Our goal is to convert the video from the latter form to the former, making it play back successfully everywhere.
You will require the following tools:
- yt-dlp to take the
.m3u8
URL and download the fragments - TSDuck to analyze and pick apart the fragments
- FFmpeg to merge it all back together
Run yt-dlp
with the --keep-fragments
option to keep the fragments:
yt-dlp --keep-fragments https://videos.example.com/video_2973/4404/4404_hls.m3u8
yt-dlp
might still try merging the file into 4404_hls [4404_hls].mp4
once it's finished, but you should also find all the fragments in the same directory, named 4404_hls [4404_hls].mp4.part-Frag1
with the final number increasing.
Ask tsp
from the TSDuck suite to analyze one of the fragments:
tsp \
-I file "4404_hls [4404_hls].mp4.part-Frag1" \
-P analyze \
-O drop
(-O drop
ensures that no MPEG data is output, since we're only interested in the textual analysis results.)
The interesting part is the Services Analysis Report:
===============================================================================
| SERVICES ANALYSIS REPORT |
|=============================================================================|
| Global PID's |
| TS packets: 1, PID's: 1 (clear: 1, scrambled: 0) |
|-----------------------------------------------------------------------------|
| PID Usage Access Bitrate |
| Total Global PID's ................................. C 137 b/s |
| Subt. Global PSI/SI PID's (0x00-0x1F) .............. C 137 b/s |
| 0x0000 PAT .......................................... C 137 b/s |
|=============================================================================|
| Service: 0x0001 (1), TS: 0x0001 (1) |
| Service name: (unknown), provider: (unknown) |
| Service type: 0x00 (Undefined) |
| TS packets: 4,816, PID's: 3 (clear: 3, scrambled: 0) |
| PMT PID: 0x01E0 (480), PCR PID: 0x01E1 (481) |
|-----------------------------------------------------------------------------|
| PID Usage Access Bitrate |
| Total Undefined .................................... C 659,385 b/s |
| 0x01E0 PMT .......................................... C 137 b/s |
| 0x01E1 AVC video (1920x1080, high profile, level 4.0 C 503,301 b/s |
| 0x01E2 MPEG-2 AAC Audio (eng) ....................... C 155,947 b/s |
| (C=Clear, S=Scrambled, +=Shared) |
===============================================================================
The following PIDs are relevant to us:
- in the global PIDs section, the PID 0x0000 (used for PAT, which is metadata)
- in the section for Service 0x0001, the PIDs 0x01E0 (used for PMT, which is metadata), 0x01E1 (video) and 0x01E2 (audio)
We can now use tsdump
from the TSDuck suite to see if we really have "all the video followed by all the audio" encoding:
tsdump --header "4404_hls [4404_hls].mp4.part-Frag1" > dump.txt
In the resulting dump.txt
file, we will see blocks of packet headers looking like this:
* Packet 0
---- TS Header ----
PID: 0x0000 (0), header size: 4, sync: 0x47
Error: 0, unit start: 1, priority: 0
Scrambling: 0, continuity counter: 0
Adaptation field: no (0 bytes), payload: yes (184 bytes)
The file will start with PIDs 0x0000 and 0x01E0 (metadata). Then, if it's actually in the order described previously, it will contain lots of packets with PID 0x01E1 (video) followed by lots of packets with PID 0x01E2 (audio) instead of those two being intermingled.
The good news is that even though the video is split into multiple fragments, the PIDs remain the same across the fragments (otherwise, playback in the browser probably stops working as well), so we only need to analyze one fragment.
We will now use tsp
to split the fragments: one will contain metadata and video, the other will contain metadata and audio.
This example assumes that the video has 25 fragments; of course; you should check how many fragments your video has.
As per the information gleaned from the above analysis, we will need:
- PIDs 0x0000, 0x01E0 and 0x01E1 for the video file
- PIDs 0x0000, 0x01E0 and 0x01E2 for the audio file
for fragnum in {1..25}
do
tsp \
-I file "4404_hls [4404_hls].mp4.part-Frag${fragnum}" \
-P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E1 \
-O file "part${fragnum}.m4v"
tsp \
-I file "4404_hls [4404_hls].mp4.part-Frag${fragnum}" \
-P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E2 \
-O file "part${fragnum}.m4a"
done
We now have 25 .m4v
files containing only the video of each fragment and 25 .m4a
files containing only the audio.
Now, we can get ffmpeg
to concatenate the video fragments into a complete video file and the same for audio.
To use ffmpeg
's concat
mode, it requires a file with inputs. We assemble this before calling ffmpeg:
# clean up after a previous run
rm -f "audios.txt" "videos.txt"
for fragnum in {1..25}
do
echo "file 'part${fragnum}.m4a'" >> "audios.txt"
echo "file 'part${fragnum}.m4v'" >> "videos.txt"
done
ffmpeg -f concat -i audios.txt -c copy 4404.m4a
ffmpeg -f concat -i videos.txt -c copy 4404.m4v
We now have two files: 4404.m4a
with the full audio and 4404.m4v
with the full video.
Finally, we can unify our two files into one stream with both video and audio:
ffmpeg -i 4404.m4v -i 4404.m4a -c copy 4404.mp4
And that's it -- 4404.mp4
contains both video and audio.
Good news: all three required tools are available for Windows as well -- and if you use PowerShell, loops should remind you a lot of C, C++, C#, Java, JavaScript and the like:
For ($fragnum = 1; $fragnum -le 25; $fragnum += 1) {
& tsp `
-I file "4404_hls [4404_hls].mp4.part-Frag$($fragnum)" `
-P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E1 `
-O file "part$($fragnum).m4v"
& tsp `
-I file "4404_hls [4404_hls].mp4.part-Frag$($fragnum)" `
-P filter --pid 0x0000 --pid 0x01E0 --pid 0x01E2 `
-O file "part$($fragnum).m4a"
}
Transforming the other loops into PowerShell is left as an exercise to the reader. ;-)