Skip to content

Instantly share code, notes, and snippets.

@j9ac9k
Created July 22, 2020 18:12
Show Gist options
  • Save j9ac9k/6b09c803d4f0551fd1d486eaee8a6d9a to your computer and use it in GitHub Desktop.
Save j9ac9k/6b09c803d4f0551fd1d486eaee8a6d9a to your computer and use it in GitHub Desktop.
Short summary of ffmpeg usage

FFmpeg Walkthrough

basics

ffmpeg is a command line utility that presents a API to interacting with a variety of media types/encodings in a uniform fashion.

Depending on the ffmpeg distribiont, you may get access to utilities such as ffprobe (which provides information on a file) and ffplay (will play back a file). Those tools are critical.

Those tools, by default, will show all the arguments that ffmpeg was compiled with, which can get a little verbose. If you're going to run many ffmpeg commands, I suggest you get used to passing the -hide_banner argument.

While you can often do just about everything media file related with ffmpeg, that does not mean you should; I would recommend installing sox, a command line utility that primarily deals with audio files, it's nowhere as feature complete but the arguments are a lot easier to remember.

Often uses of ffmpeg are sort of last-resort-ish... we do not like modifying our data in /dat/corpora, while some audio transformations can be done w/o losing any data, doing so can cause other issues. When feasible, it's best to leave the files alone and have us modify our tools accordingly.

weird encoding

I'm going to start this bit with a rant. While I am perfectly good with non-sensical names for things, please take care to name things that are relatively easy to search for. If you name an audio encoding "shorten", please consider the head ache you will cause having people googling "shorten audio file" or something along those lines.

Early in the Fred development, I got reports th at Fred was unable to read one of the test audio files from the toolkit. The explanation I got was the file was a NIST encoding. I thought this was odd, as the audio library I use to interact with audio files (libsndfile) does support NIST files. So what gives? What makes this file so special?

ognyan@wfh:~$ ffprobe -hide_banner -i testSample.wav
Input #0, nistsphere, from 'testSample.wav':
  Metadata:
    microphone      : Sennheiser
    recording_site  : SRI
    database_id     : wsj1
    database_version: 1.0
    recording_environment: quiet
    speaker_session_number: 01
    session_utterance_number: 03
    prompt_id       : adapt.03
    utterance_id    : 44aa0103
    speaking_mode   : read-adaptation
    speaker_id      : 44a
    sample_min      : -854
    sample_max      : 683
    sample_checksum : 63835
    recording_date  : 02-Dec-1992
    recording_time  : 12:45:30.00
  Duration: 00:00:05.41, bitrate: 87 kb/s
    Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16p

As you can see, ffprobe gives all sorts of good information, but this bottom bit is the bit of interest:

Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16p

I had to do a fair amount of googling, to discover shorten is an encoding scheme that was used ages ago. Our toolkit and wave package has support for it (which is why this file works in SView), but Fred does not.

I created an issue with libsndfile to support it; which they said they're open to a PR but given how nobody (but us) uses shorten encoded audio files, they aren't particularly motivated.

ffmpeg is able to play back this audio file via ffplay command with ease. A future potential feature I'll roll out down the line is embed ffmpeg into Fred to handle reading of data.

bad-header

I came across this issue almost on accident as I was added to an email chain about corrupt audio files. Only a handful of the files were corrupt, which struck me as somewhat odd.

First step, on macOS, there is a command line utility, afplay.

ognyan@wfh:~$ afplay ambiance_20200412_21h.wav

This immediately exits, no audio playing. Given this, as the email report saying our tools were not handling this file, clearly there was something wrong.

ognyan@wfh:~$ ffplay -hide_banner ambiance_20200412_21h.wav

Here we can hear the audio, so what gives?

ognyan@wfh:~$ ffprobe -hide_banner ambiance_20200412_21h.wav
[wav @ 0x7fe3a3808200] Estimating duration from bitrate, this may be inaccurate
Input #0, wav, from 'ambiance_20200412_21h.wav':
  Duration: 01:08:55.68, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

So it has this warning about the duration...

Doing a quick google search turned me to sox command that can patch it up

ognyan@wfh:~$ sox --ignore-length ambiance_20200412_21h.wav fixed.wav

After this, we can verify we don't get the warning

ognyan@wfh:~$ ffprobe -hide_banner fixed.wav
Input #0, wav, from 'fixed.wav':
  Duration: 01:08:55.68, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

👍

I have one four channel file, can I get four one channel files?

Saw a message from Grace requesting this late at night, there was some email chains already going around regarding 4 channel audio files, but when asking for the path to the file, the first thing I do, make sure it has actually 4 channels:

ognyan@wfh:~$ ffprobe -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav
Input #0, wav, from '200629_prePV_SET_Max_Noise_micin.wav':
  Duration: 00:15:36.00, bitrate: 1024 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 4 channels, s16, 1024 kb/s

yup...4 channels... who does that... anyway

To split it up, this is pretty easy with ffmpeg, still had to google the command. Before I show you what I did, I want to reference this awesome stack overflow post:

https://superuser.com/questions/685910/ffmpeg-stereo-channels-into-two-mono-channels

ognyan@wfh:~$ ffmpeg -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav \
-map_channel 0.0.0 first.wav \
-map_channel 0.0.1 second.wav \
-map_channel 0.0.2 third.wav \
-map_channel 0.0.3 fourth.wav

This generates 4 files, first.wav, second.wav, third.wav and fourth.wav, each file corresponding to which audio channel there was.

The link above showcases a few other ways of handling this as well.

what is an amr file?

This isn't really a request, but this is an issue brought up about having some files but there were in "amr" format.

After getting the path to a file:

ognyan@wfh:~$ ffplay -hide_banner -i 2_26134184.amr

Sure enough, audio plays fine... now Stan did not request this, but let's say we want to convert the file into some kind of usable wav file.

ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr file.wav

You can get fancy, and specify different sample rates if you want or convert to other kind of fancy loss-less audio encoding

ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr c:a libopus file.ogg

Here, we said to use the libopus audio codec which supports the ogg type audio files.

video

I have a cute video of my cats playing...but I have a bazillion of these videos and I'd like to squish it down some...

ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-crf 28 \
-c:a copy \
first.mp4

Let's say, I know I'm going to play this on a chromecast, which does have native x265 decoding capability, but I'm going to want to enable some flags to make the video a bit easier to playback

ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-crf 28 \
-level 4.1 \
-tune fastdecode \
-c:a copy \
second.mp4

Now let's say, I have a fair amount of time to encode this video, but I really want to compress the file size.

ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-preset veryslow \
-crf 36 \
-level 4.1 \
-tune fastdecode \
-c:a copy \
third.mp4

Making awful Star Wars DVD Rip Usable

Sorry, don't have the original on hand, so you'll have to bear with my explanations.

I have a Star Wars Theatrical Release DVD from China (from ~20 years ago). While the movie on it certainly worked, there were some very significant problems regarding playback.

  1. Hard-coded vertical black bars, while the video was in a widescreen format, the people who made it decided to force the video into a 4:3 aspect ratio by adding black bars above and below the video. When you played the video on a 16:9 screen (current standard) you would get black bars on the sides; the end result being you would have a very thick black border all the way around the video, taking up a majority of the pixels on the screen
  2. The subtitles were in VOB format, which are effectively bitmaps. On high resolution screens they look like rubish, and for media playback software such as Plex, they make transcoding substantially more difficult
  3. The video encdoding was something ancient (maybe MPEG1?), which is not only horribly inefficient, but a lot of modern video players (VNC) would display many artifacts throughout playback, and skipping forward/behind would be a problematic operations as well.

First thing to address was getting rid of those black bars.

ognyan@wfh:~$ ffmpeg -i input.mkv -vf cropdetect=24:16:0 dummy.mkv
...
[Parsed_cropdetect_0 @ 0x3704360] x1:0 x2:639 y1:43 y2:317 w:640 h:272 x:0 y:46 pts:181320 t:181.320000 crop=640:272:0:46
...

Here the bit we care about is 640:272:0:46

ognyan@wfh:~$ ffmpeg -i Star\ Wars\ -\ Episode\ VI\ -\ Return\ of\ the\ Jedi.mkv \
-vf crop=704:272:8:104 \
-aspect 704:272 \
-c:v libx264 \
-crf 17 \
-c:a copy \
-profile:v high \
-preset medium \
-tune fastdecode \
-tune film \
-tune grain \
-level 4.1 \
-movflags +faststart \
output.mkv

in this command, I do the following operations:

  • crop the video (to a different value as this is a different video)
  • set the aspect to 704:272 figuring this would guarantee that the video would not be stretched innapropriately in either direction
  • re-encode the video to h264 format using libx264 (original does not matter)
  • I set the "constant rate factor" (crf) to 17, which should make the output indistinguishable from the source material (crf of 0 is technically lossless, but you would get huge file output sizes)
  • -c:a copy means copy the audio and do nothing to do it...
  • -profile:v high this is a h264 specific encoding parameter, that restricts which filters are applied to the video. Without this argument, the video is compatible with the largest variety of devices, but virtually all modern phones, tablets, video-players, etc support the high profile setting, so this allows for slightly smaller file-sizes as a result
  • -preset medium We discussed, the preset settings have to do with encoding, the idea is that for a slower preset, the quality of the result will be higher for the same filesize. This should be set to the slowest setting you can tolerate, often fast or veryfast are used.
  • -tune <parameter> more can be seen about these in the ffmpeg wiki documentation for x264 encoding.
  • -level 4.1 this is another x264 setting, by specifying the level, I reduce compatability of the output video, however this setting is compatible with virtually all my devices, chromecasts, raspberry pi's etc...
  • -movflags +faststart this setting is only applicable with some output video encodings, but the idea here is that video/audio index in the very beginning of the file (not compatible with all applications, but offers significant benefits such as videos being able to start right away).

While this takes care of the video, I still have problems with the subtitles, while I'm sure there was a way I can deal with this with ffmpeg, I elected to use tools such as mkvtools, tesseract (which is usable with ffmpeg) and vobsub2srt.

First, I identify subtitles

ognyan@wfh:~$ mkvinfo some_movies.mkv
| + Track
|  + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
|  + Track UID: 3
|  + Track type: subtitles
|  + Enabled: 1
|  + Default track flag: 1
|  + Forced track flag: 0
|  + Lacing flag: 0
|  + Minimum cache: 0
|  + Maximum block additional ID: 0
|  + Codec ID: S_VOBSUB
|  + Codec decode all: 1
|  + Language: eng
|  + Codec's private data: size 508
| + Track
|  + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
|  + Track UID: 4
|  + Track type: subtitles
|  + Enabled: 1
|  + Default track flag: 0
|  + Forced track flag: 0
|  + Lacing flag: 0
|  + Minimum cache: 0
|  + Maximum block additional ID: 0
|  + Codec ID: S_VOBSUB
|  + Codec decode all: 1
|  + Language: fre
|  + Codec's private data: size 508
| + Track
|  + Track number: 5 (track ID for mkvmerge & mkvextract: 4)
|  + Track UID: 5
|  + Track type: subtitles
|  + Enabled: 1
|  + Default track flag: 0
|  + Forced track flag: 0
|  + Lacing flag: 0
|  + Minimum cache: 0
|  + Maximum block additional ID: 0
|  + Codec ID: S_VOBSUB
|  + Codec decode all: 1
|  + Language: spa
|  + Codec's private data: size 508

I can see trakcs, 3, 4, and 5 are subtitle tracks... I then extract them

ognyan@wfh:~$ mkvextract tracks \
some_movie.mkv \
3:some_movie.eng.srt \
4:some_movie.fre.srt \
5:some_movie.spa.srt

Here, I had to manually install a tool that's not part of the homebrew distribution:

ognyan@wfh:~$ brew install --with-all-languages tesseract
brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rb

I then create my srt subtitles:

ognyan@wfh:~$ vobsub2srt some_movie.eng
vobsub2srt some_movie.fre
vobsub2srt some_movie.spa

ass is considered the most versatile/compatible format, so I use ffmpeg to convert the subtitles from srt to ass

ognyan@wfh:~$ ffmpeg -i some_movie.eng.srt some_movie.eng.ass

I then add the subtitle files back into the video:

ognyan@Wfh:~$ ffmpeg -i some_movie.mkv -i some_movie.eng.ass \
-codec copy \
-map 0 \
-map 1 \
-metadata:s:s:0 language=eng \
output.mkv

bonus hardware support! (cuda)

ffmpeg has cuda support for video transcoding and some filters... I won't go into specific use-cases but you can genereally transcode from most video formats, to either h264 or h265 ... in addition the cuda support does support some video filters such as scaling.

the hardware supported ffmpeg offers significant performance improvements (testing some videos I was transcoding at almost 200x playback speed vs. 2-5x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment