Skip to content

Instantly share code, notes, and snippets.

Last active August 27, 2023 13:47
Show Gist options
  • Save ktoraskartwilio/8745fb5066977808543a894c7d1e6667 to your computer and use it in GitHub Desktop.
Save ktoraskartwilio/8745fb5066977808543a894c7d1e6667 to your computer and use it in GitHub Desktop.
Mixing Recordings

When mixing the tracks, we need to consider that they might (and probably have) started at different times. If we were to merge tracks without taking this into account, we would end up with synchronization issues. In our example, since Bob got in the room a good 20s (and that’s really a huge time for synchronization of audios), mixing both Alice’s and Bob’s audio tracks together would end up having one speaking over the other.

To make merging easier, the start time of all tracks from the same room is the creation of the room itself. Let’s get the start times for all the tracks from this room

Get Alice's audio start time

$ ffprobe -show_entries format=start_time alice.mka Input #0, matroska,webm, from 'alice.mka': Metadata: encoder : GStreamer matroskamux version creation_time : 2017-06-30T09:03:44.000000Z Duration: 00:13:09.36, start: 1.564000, bitrate: 48 kb/s Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Metadata: title : Audio [FORMAT] start_time=1.564000 [/FORMAT]

Get Alice's video start time

$ ffprobe -show_entries format=start_time alice.mkv Input #0, matroska,webm, from 'alice.mkv': Metadata: encoder : GStreamer matroskamux version creation_time : 2017-06-30T09:03:44.000000Z Duration: 00:13:09.33, start: 1.584000, bitrate: 857 kb/s Stream #0:0(eng): Video: vp8, yuv420p(progressive), 640x480, SAR 1:1 DAR 4:3, 1k tbr, 1k tbn, 1k tbc (default) Metadata: title : Video [FORMAT] start_time=1.584000 [/FORMAT]

Get Bob's audio start time

$ ffprobe -show_entries format=start_time bob.mka Input #0, matroska,webm, from 'bob.mka': Metadata: encoder : GStreamer matroskamux version creation_time : 2017-06-30T09:04:03.000000Z Duration: 00:12:49.46, start: 20.789000, bitrate: 50 kb/s Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Metadata: title : Audio [FORMAT] start_time=20.789000 [/FORMAT]

Get Bob's video start time

$ ffprobe -show_entries format=start_time bob.mkv ffprobe version 3.3.2 Copyright (c) 2007-2017 the FFmpeg developers built with Apple LLVM version 8.0.0 (clang-800.0.42.1) Input #0, matroska,webm, from 'bob.mkv': Metadata: encoder : GStreamer matroskamux version creation_time : 2017-06-30T09:04:03.000000Z Duration: 00:12:49.42, start: 20.814000, bitrate: 1645 kb/s Stream #0:0(eng): Video: vp8, yuv420p(progressive), 640x480, SAR 1:1 DAR 4:3, 1k tbr, 1k tbn, 1k tbc (default) Metadata: title : Video [FORMAT] start_time=20.814000 [/FORMAT]

Track start_time (in ms) creation_time
alice.mka 1564 2017-06-30T09:03:44.000000Z
alice.mkv 1584 2017-06-30T09:03:44.000000Z
bob.mka 20789 2017-06-30T09:04:03.000000Z
bob.mkv 20814 2017-06-30T09:04:03.000000Z

We can see that the start_time from different media types (audio and video) is not the same for the same participant, as media arrives with a slight offset after the webrtc negotiation. File creation_time offsets translate into an equivalent start-time offset. The ~20s that Bob had Alice waiting can be obtained directly from the start_time. If the start_time of a track would have been relative to the creation_time, we would have had to first get the offset in the creation_time and then calculated the final offset. Since creation_time does not have ms precision, this could lead to synchronization issues.

When merging the different tracks, we’ll need to use as time reference the one that has the lowest start_time. This is important to keep all tracks in sync. What we will do is

  • Take the lowest start_stime value of all tracks. In our case that’s alice.mka with a start_time of 1564ms.
  • Use that track as reference, by not indicating any offset when mixing tracks. We will reflect that in our track list by offsetting it as 0.
  • Calculate the offset for tracks, by subtracting the reference start_time value from all the others. The following table shows the current values for our tracks, in which alice.mka is the reference value with 1564ms
Track Track# Current value Reference value Offset in ms
alice.mka 0 1564 1564 0
alice.mkv 1 1584 1564 20
bob.mka 2 20789 1564 19225
bob.mkv 3 20814 1564 19250

Anatomy of an ffmpeg command Before we start mixing the files, we’re going to have a quick overview of the ffmpeg program command line arguments that we are going to use. All commands will have the following structure ffmpeg [[infile options] -i infile] {filter options} {[outfile options] outfile}

  • ffmpeg: program name
  • [[infile options] -i infile]: This tells the program what input files to use. You’ll need to add -i for each file that you want to add to the conversion process. [infile options] will only be used for video tracks, to indicate the offset with the -itsoffset flag. The position of a file in the inputs list is important, as we’ll later use this position to reference the track. ffmpeg treats this input list as a zero-based array. References to input tracks have the form [#input], where #input is the position the track has in the input array. For instance, in the list ffmpeg -i alice.mka -i bob.mka -i alice.mkv -i bob.mkv we would use these references
    • alice.mka[0] the first input
    • alice.mkv[1] the second input
    • bob.mka[2] the third input
    • bob.mkv[3] the fourth input
  • {filter options}: this is where we define what to do with the input tracks, whether it is mixing audio, video or both.
  • {[outfile options] outfile}: defined where to store the output of the program.

Mixing audio Let’s see how we can mix different audio files together, while keeping them in sync. First thing we do is find the offset values for the tracks we are going to mix

Track Name Track# Current value Reference value Offset in ms
alice.mka 0 1564 1564 0
bob.mka 1 20789 1564 19225

The complete command to obtain the audio file in webm is

ffmpeg \
  -i alice.mka -i bob.mka \
  -filter_complex \
    "[1]adelay=19225|19225[d]; \
     [0][d]amix=inputs=2" \
  -strict -2 output_audio.webm

Let’s decompose this command so you see where everything comes from

  • -i alice.mka -i bob.mka: These are the input files. We’ve made sure that the first input is the 0-delay track.
  • Define filter and options: This is the tricky part. We’ll dissect this in different substeps
    • -filter_complex: we’ll be performing a filter operation on all tracks passed as input.
    • [#]adelay={delay}|{delay}[t#];: For each track with an offset value > 0, we need to indicate the audio delay in milliseconds for that track
      • [#]: As explained in +Working with room recordings: Anatomy-of-an-ffmpeg-command, each track is referenced by its position in the inputs array (-i alice.mka -i bob.mka). Since only bob.mka is delayed, there’s only one block.
      • adelay={delay}|{delay}: will be delaying the audio for both left and right channels an amount of {delay} seconds.
      • [t#]: this is a label that we’ll use to reference this filtered track
    • [0][t1]..[tn]amix=inputs={#of-inputs}: Once we have added the appropriate delays for all audio tracks, we configure the filter that’ll perform the actual audio mixing, where n is the position of the n-th track. In our case, there are only two tracks, that’s why we only use [0][t1].

If we want to get the output in mp4 format, we can do so just adding the -``c:a aac flag, which sets the audio code to aac. It is convenient to change the filename extension to reflect this change to mp4

ffmpeg -i alice.mka -i bob.mka -filter_complex "[1]adelay=19225|19225[d];[0][d]amix=inputs=2" -c:a aac -strict -2 output_audio.mp4

Mixing video Now we are going to do the same thing with the video files, also keeping them in sync. First thing we do is find the offset values for the tracks we are going to mix

Track Track# Current value in ms Reference value in ms Offset in ms
alice.mkv 0 1584 1584 0
bob.mkv 1 20814 1584 19230

Encoding in VP8 is achieved with the following command

ffmpeg \
  -i alice.mkv -itsoffset 19.230 -i bob.mkv \
  -filter_complex \
    "[0]pad=iw*2:ih[int]; \
     [int][1]overlay=W/2:0[vid]" \
  -map [vid] \
  -c:v libvpx \
  -crf 23 \
  -quality good -cpu-used 3 -an \

Putting it all together Now the only thing left is to mix both audio and video tracks obtained in the previous steps with this command

ffmpeg -i output_video.webm -i output_audio.webm  -c:v copy -c:a copy full.webm

-c:v copy -c:a copy prevents ffmpeg from transcoding again the existing tracks, thus saving time.

Copy link

Hi, the concern is in last statement, where the Merged Audio and Merged Video files are merged together in One Communication File. The Time Lag between First Audio File (Alice.mka, 1564), and First Video File (Alice.mkv, 1584) is not taken into consideration, resulting in timelag of 20ms as per this example.
In our case, in normal usage, the timegap between Audio and Video is in seconds, hence making entire result as out of Sync.
Kindly help resolve. Thanks, Sunil, [email protected]

Copy link

igracia commented Dec 29, 2017

@Intraview-in can't update this gist, but please have a look at the updated version of the guide in this fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment