Skip to content

Instantly share code, notes, and snippets.

@kitsumed
Last active March 4, 2025 21:10
Show Gist options
  • Select an option

  • Save kitsumed/20fa8410514e560400c0c02ecf8c3b46 to your computer and use it in GitHub Desktop.

Select an option

Save kitsumed/20fa8410514e560400c0c02ecf8c3b46 to your computer and use it in GitHub Desktop.
Crafting a WebM file that plays different tracks based on the device/software used to play it.

I recently encountered an intriguing WebM file that played different videos depending on the device or software used. Specifically, the video varied between Firefox, Chromium-based browsers/Electron, and Android.

Curious about how it worked, I searched online but found no relevant information. To investigate further, I examined the file's bytes and did see there was multiples track but couln't understand how it worked. Out of idea, I decided to open the URL in the metadata that was crediting the original creator @piousdeer and reached out to them.

The creator responded and recommended an old Java program called EBML Viewer. They advised me to pay close attention to the TrackType field.

After messing around a bit in the program, I realized my initial guess was right, multiple tracks were in the file, with the only difference being the TrackType value. One track was 2 (audio), another was 1 (video), and the remaining two were 3 (complex).

I then tried to craft my own webm file by first using the following FFmpeg command: ffmpeg -i video1.mp4 -i video2.mp4 -i video2.mp4 -map 0:a:0 -map 0:v:0 -map 1:v:0 -map 2:v:0 -c:v libvpx-vp9 -c:a libopus -b:a 256k -metadata:s:v:0 title=Complex 1 -metadata:s:v:1 "title=Complex 2" -metadata:s:v:2 "title=Primary Video" output.webm

This had the effect of creating a webm file with 4 TrackEntity. One audio (TrackType=2) and three video (TrackType=1). In the current state of the file, playing it in Firefox and Android already play a different video.

Note

Note that from my tests, for the file to play a different video on Android, the TrackEntity need to be created in the following order: Audio, Video or Complex, Video or Complex, Video

As it seems that FFmpeg doesn't support defining a TrackType to 3, we have to modify the file manually by editing its bytes using HxD.

With the help of EBML Viewer, I discovered that the TrackType in hex was represented by 83 81, followed by its value (in this case, 01).

For a visual reference, here are the first bytes of a WebM file. In this example, the last three pairs of bytes are the ones we're interested in:

1A 45 DF A3 9F 42 86 81 01 42 F7 81 01 42 F2 81 04 42 F3 81 08 42 82 84 77 65
62 6D 42 87 81 04 42 85 81 02 18 53 80 67 01 00 00 00 00 1D 7F AD 11 4D 9B 74
BB 4D BB 8B 53 AB 84 15 49 A9 66 53 AC 81 A1 4D BB 8B 53 AB 84 16 54 AE 6B 53
AC 81 D6 4D BB 8C 53 AB 84 12 54 C3 67 53 AC 82 02 53 4D BB 8D 53 AB 84 1C 53
BB 6B 53 AC 83 1D 7F 4F EC 01 00 00 00 00 00 00 58 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 15 49 A9 66 B0 2A D7 B1 83 0F 42 40 4D 80 8C 4C 61 76 66 36 31 2E 39 2E 31
30 30 57 41 8C 4C 61 76 66 36 31 2E 39 2E 31 30 30 44 89 88 40 C4 4E 80 00 00
00 00 16 54 AE 6B 41 77 AE 01 00 00 00 00 00 00 59 D7 81 01 73 C5 88 0F 52 42
BB B8 7A 09 47 9C 81 00 22 B5 9C 83 75 6E 64 86 86 41 5F 4F 50 55 53 56 AA 83
63 2E A0 56 BB 84 04 C4 B4 00 [83 81] (01) <-- TrackType is 01, so 1. It is a video.

To differentiate the two complex videos from the "Primary Video," I instructed FFmpeg to assign them different titles when creating the file.

In HxD, the decoded text section on the right side of the hex view reveals these titles, allowing us to easily identify the TrackEntity. We can see the text "Complex 1," "Complex 2," and "Primary Video" embedded in the metadata.
The first TrackType (83 81) that appears after the TrackEntry title belongs to that specific TrackEntry.

Tip

If you search for the bytes 83 81, the first 4 results should correspond to your TrackEntry tracks. From there, you only need to identify the ones with "Complex" in their title.

After identifying the positions of the two TrackType values corresponding to the tracks with titles with "Complex", we need to replace their values from 01 to 03. This modification changes their TrackType from video to complex.

If we reopen the edited file in EBML Viewer, we should now see that both tracks with "Complex" in their title have their TrackType set to 3.

From this point, if you open your webm file in a video player like VLC, you should only see one track available, even though three tracks exist in the file.

In the current state of the file, reading it on Firefox/VLC/Chromium-based/Electron would play the Complex 2 video. Reading it on Android would play the Primary Video. The Complex 1 video would never play.

This is where I started to get confused again. I was satisfied because I now understood how it worked and managed to get a different video to play on Android devices. However, the file @piousdeer created was also playing a different video between Firefox/VLC and Chromium-based/Electron browsers.

After discussing further with @piousdeer, I learned that the Complex 1 video actually had two TrackType values: one set to 03 and the other set to 01. I tried replacing the existing value inside the TrackEntry with a TrackType (83 81), but it didn’t have much effect, except for corrupting the file.

At this point, I decided to stop my experiments. I was already happy because I understood how it worked and had all the answers, even though I didn’t manage to replicate the final step.

I then created a detailed prompt and asked ChatGPT to generate me a python script that would automatically run the FFmpeg command along with the bytes replacing (I edited some of it), I attached that file as PoC.py in this gist. For the python script to work you need FFmpeg in your PATH env or in the same directory as the python file.

Sample / Example

sample.webm

Credits

Thanks again to @piousdeer for his help as I don't think I would have learned about TrackType existence alone. I would also like to credits @19wintersp who made the same experiment in 2021, I found his blog after my experiments, when I already knew of "TrackType" but it was a interesting read.

'''
Copyright 2025 kitsumed (Med)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
'''
import os
import subprocess
import sys
import shutil
import re
def find_ffmpeg():
"""Finds ffmpeg executable in the system PATH or local directory."""
ffmpeg_exec = "ffmpeg"
if shutil.which(ffmpeg_exec):
return ffmpeg_exec
local_path = os.path.join(os.path.dirname(__file__), ffmpeg_exec)
if os.path.exists(local_path):
return local_path
print("Error: FFmpeg not found.")
sys.exit(1)
def create_empty_video(output_file):
"""Creates an empty video (black screen, very low bitrate)."""
ffmpeg = find_ffmpeg()
# Build the ffmpeg command to create a black screen video with very low bitrate
ffmpeg_cmd = [ffmpeg, "-f", "lavfi", "-t", "1", "-i", "color=c=black:s=1x1:r=1",
"-c:v", "libvpx-vp9", "-b:v", "1k", "-an", output_file] # Very low bitrate video
subprocess.run(ffmpeg_cmd, check=True)
print(f"Created empty video: {output_file}")
def concatenate_videos(primary_video, complex_video1, complex_video2, output_file):
"""Concatenates videos using FFmpeg with VP9 encoding and proper mapping."""
ffmpeg = find_ffmpeg()
# Build ffmpeg command to concatenate videos
ffmpeg_cmd = [ffmpeg, "-y", "-i", primary_video, "-i", complex_video1, "-i", complex_video2]
# Map audio
ffmpeg_cmd += ["-map", "0:a:0"]
# Map video tracks
ffmpeg_cmd += ["-map", "0:v:0", "-map", "1:v:0", "-map", "2:v:0"]
ffmpeg_cmd += [
"-c:v", "libvpx-vp9", "-c:a", "libopus", "-b:a", "256k",
"-metadata:s:v:0", "title=Complex 1",
"-metadata:s:v:1", "title=Complex 2",
"-metadata:s:v:2", "title=Primary Video"
]
ffmpeg_cmd.append(output_file)
subprocess.run(ffmpeg_cmd, check=True)
def modify_video_track_to_complex(webm_file):
complexTitle = b"Complex "
# Title defined inside the metadata
title_pattern = re.compile(b"Complex ", re.IGNORECASE)
track_type_pattern = b'\x83\x81\x01' # The TrackType (83 81) and value (01) to replace with 03
modified_count = 0
with open(webm_file, "r+b") as f:
# Get the size of the file
file_size = os.path.getsize(webm_file)
# Up to 1MB of bytes or less depending on file size
max_read_size = min(file_size, 1024 * 1024)
# Read with max_read_size
chunk = f.read(max_read_size)
offset = 0 # Start offset for searching the "Complex " in the file
while offset < len(chunk):
# Search for the "Complex Video" title and corresponding track type
matches = [m.start() for m in title_pattern.finditer(chunk[offset:])]
if not matches:
break # No more matches found, exit the loop
for title_pos in matches:
# Calculate the absolute position of the match in the file
title_pos_absolute = offset + title_pos
# Start searching for the TrackType Video (83 81 01) after the title position
index = title_pos_absolute
f.seek(index) # Move the file pointer to the position of the title
chunk_data = f.read(1024) # Read 1024 bytes after the title position
# Search for the track type marker (83 81 01) within the chunk
track_pos = chunk_data.find(track_type_pattern)
if track_pos != -1:
# Absolute position of the found track type marker in the file
track_pos_absolute = f.tell() - len(chunk_data) + track_pos
# Move the file pointer to the position of the track type
f.seek(track_pos_absolute + 2) # Move to the '01' byte
f.write(b'\x03') # Replace '01' with '03'
modified_count += 1
print(f"Modified track at position: {track_pos_absolute}")
# Update offset to search beyond the current "Complex "
offset += matches[-1] + len(b"Complex ") # Move offset to just after the last match
if modified_count > 0:
print(f"Modified {modified_count} video track(s) with 'Complex ' title to a Complex TrackType (3) in {webm_file}")
else:
print("No matching video tracks found for modification.")
# Ask primary video and complex2, don't ask complex1 as this script does not add a additional TrackType to complex1
def main():
# Get input for video files
primary_video = input("Enter the path for the primary video (Android): ")
complex_video2 = input("Enter the path for the second complex video (Firefox/VLC/Chromium-based/Electron): ")
# Get output file path
output_file = input("Enter the output file name (e.g., output.webm): ")
# We know complex1 is never gonna play, so we create a "template" file with a small size to reduce the final output size
create_empty_video("empty_complex1_small.webm")
# Process the videos based on user input
concatenate_videos(primary_video, "empty_complex1_small.webm", complex_video2, output_file)
modify_video_track_to_complex(output_file)
print("Processing complete.")
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment