ffmpeg
is a command line utility that presents a API to interacting with a variety of media types/encodings in a uniform fashion.
Depending on the ffmpeg
distribiont, you may get access to utilities such as ffprobe
(which provides information on a file) and ffplay
(will play back a file). Those tools are critical.
Those tools, by default, will show all the arguments that ffmpeg
was compiled with, which can get a little verbose. If you're going to run many ffmpeg commands, I suggest you get used to passing the -hide_banner
argument.
While you can often do just about everything media file related with ffmpeg
, that does not mean you should; I would recommend installing sox
, a command line utility that primarily deals with audio files, it's nowhere as feature complete but the arguments are a lot easier to remember.
Often uses of ffmpeg
are sort of last-resort-ish... we do not like modifying our data in /dat/corpora
, while some audio transformations can be done w/o losing any data, doing so can cause other issues. When feasible, it's best to leave the files alone and have us modify our tools accordingly.
I'm going to start this bit with a rant. While I am perfectly good with non-sensical names for things, please take care to name things that are relatively easy to search for. If you name an audio encoding "shorten", please consider the head ache you will cause having people googling "shorten audio file" or something along those lines.
Early in the Fred development, I got reports th at Fred was unable to read one of the test audio files from the toolkit. The explanation I got was the file was a NIST encoding. I thought this was odd, as the audio library I use to interact with audio files (libsndfile
) does support NIST files. So what gives? What makes this file so special?
ognyan@wfh:~$ ffprobe -hide_banner -i testSample.wav
Input #0, nistsphere, from 'testSample.wav':
Metadata:
microphone : Sennheiser
recording_site : SRI
database_id : wsj1
database_version: 1.0
recording_environment: quiet
speaker_session_number: 01
session_utterance_number: 03
prompt_id : adapt.03
utterance_id : 44aa0103
speaking_mode : read-adaptation
speaker_id : 44a
sample_min : -854
sample_max : 683
sample_checksum : 63835
recording_date : 02-Dec-1992
recording_time : 12:45:30.00
Duration: 00:00:05.41, bitrate: 87 kb/s
Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16p
As you can see, ffprobe
gives all sorts of good information, but this bottom bit is the bit of interest:
Stream #0:0: Audio: shorten, 16000 Hz, 1 channels, s16p
I had to do a fair amount of googling, to discover shorten is an encoding scheme that was used ages ago. Our toolkit and wave package has support for it (which is why this file works in SView), but Fred does not.
I created an issue with libsndfile
to support it; which they said they're open to a PR but given how nobody (but us) uses shorten encoded audio files, they aren't particularly motivated.
ffmpeg
is able to play back this audio file via ffplay
command with ease. A future potential feature I'll roll out down the line is embed ffmpeg into Fred to handle reading of data.
I came across this issue almost on accident as I was added to an email chain about corrupt audio files. Only a handful of the files were corrupt, which struck me as somewhat odd.
First step, on macOS, there is a command line utility, afplay
.
ognyan@wfh:~$ afplay ambiance_20200412_21h.wav
This immediately exits, no audio playing. Given this, as the email report saying our tools were not handling this file, clearly there was something wrong.
ognyan@wfh:~$ ffplay -hide_banner ambiance_20200412_21h.wav
Here we can hear the audio, so what gives?
ognyan@wfh:~$ ffprobe -hide_banner ambiance_20200412_21h.wav
[wav @ 0x7fe3a3808200] Estimating duration from bitrate, this may be inaccurate
Input #0, wav, from 'ambiance_20200412_21h.wav':
Duration: 01:08:55.68, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
So it has this warning about the duration...
Doing a quick google search turned me to sox
command that can patch it up
ognyan@wfh:~$ sox --ignore-length ambiance_20200412_21h.wav fixed.wav
After this, we can verify we don't get the warning
ognyan@wfh:~$ ffprobe -hide_banner fixed.wav
Input #0, wav, from 'fixed.wav':
Duration: 01:08:55.68, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
👍
Saw a message from Grace requesting this late at night, there was some email chains already going around regarding 4 channel audio files, but when asking for the path to the file, the first thing I do, make sure it has actually 4 channels:
ognyan@wfh:~$ ffprobe -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav
Input #0, wav, from '200629_prePV_SET_Max_Noise_micin.wav':
Duration: 00:15:36.00, bitrate: 1024 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 4 channels, s16, 1024 kb/s
yup...4 channels... who does that... anyway
To split it up, this is pretty easy with ffmpeg
, still had to google the command. Before I show you what I did, I want to reference this awesome stack overflow post:
https://superuser.com/questions/685910/ffmpeg-stereo-channels-into-two-mono-channels
ognyan@wfh:~$ ffmpeg -hide_banner -i 200629_prePV_SET_Max_Noise_micin.wav \
-map_channel 0.0.0 first.wav \
-map_channel 0.0.1 second.wav \
-map_channel 0.0.2 third.wav \
-map_channel 0.0.3 fourth.wav
This generates 4 files, first.wav
, second.wav
, third.wav
and fourth.wav
, each file corresponding to which audio channel there was.
The link above showcases a few other ways of handling this as well.
This isn't really a request, but this is an issue brought up about having some files but there were in "amr" format.
After getting the path to a file:
ognyan@wfh:~$ ffplay -hide_banner -i 2_26134184.amr
Sure enough, audio plays fine... now Stan did not request this, but let's say we want to convert the file into some kind of usable wav file.
ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr file.wav
You can get fancy, and specify different sample rates if you want or convert to other kind of fancy loss-less audio encoding
ognyan@wfh:~$ ffmpeg -hide_banner -i 2_26134184.amr c:a libopus file.ogg
Here, we said to use the libopus
audio codec which supports the ogg
type audio files.
I have a cute video of my cats playing...but I have a bazillion of these videos and I'd like to squish it down some...
ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-crf 28 \
-c:a copy \
first.mp4
Let's say, I know I'm going to play this on a chromecast, which does have native x265 decoding capability, but I'm going to want to enable some flags to make the video a bit easier to playback
ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-crf 28 \
-level 4.1 \
-tune fastdecode \
-c:a copy \
second.mp4
Now let's say, I have a fair amount of time to encode this video, but I really want to compress the file size.
ognyan@wfh:~$ ffmpeg -i original.MOV \
-c:v libx265 \
-preset veryslow \
-crf 36 \
-level 4.1 \
-tune fastdecode \
-c:a copy \
third.mp4
Sorry, don't have the original on hand, so you'll have to bear with my explanations.
I have a Star Wars Theatrical Release DVD from China (from ~20 years ago). While the movie on it certainly worked, there were some very significant problems regarding playback.
- Hard-coded vertical black bars, while the video was in a widescreen format, the people who made it decided to force the video into a 4:3 aspect ratio by adding black bars above and below the video. When you played the video on a 16:9 screen (current standard) you would get black bars on the sides; the end result being you would have a very thick black border all the way around the video, taking up a majority of the pixels on the screen
- The subtitles were in VOB format, which are effectively bitmaps. On high resolution screens they look like rubish, and for media playback software such as Plex, they make transcoding substantially more difficult
- The video encdoding was something ancient (maybe MPEG1?), which is not only horribly inefficient, but a lot of modern video players (VNC) would display many artifacts throughout playback, and skipping forward/behind would be a problematic operations as well.
First thing to address was getting rid of those black bars.
ognyan@wfh:~$ ffmpeg -i input.mkv -vf cropdetect=24:16:0 dummy.mkv
...
[Parsed_cropdetect_0 @ 0x3704360] x1:0 x2:639 y1:43 y2:317 w:640 h:272 x:0 y:46 pts:181320 t:181.320000 crop=640:272:0:46
...
Here the bit we care about is 640:272:0:46
ognyan@wfh:~$ ffmpeg -i Star\ Wars\ -\ Episode\ VI\ -\ Return\ of\ the\ Jedi.mkv \
-vf crop=704:272:8:104 \
-aspect 704:272 \
-c:v libx264 \
-crf 17 \
-c:a copy \
-profile:v high \
-preset medium \
-tune fastdecode \
-tune film \
-tune grain \
-level 4.1 \
-movflags +faststart \
output.mkv
in this command, I do the following operations:
- crop the video (to a different value as this is a different video)
- set the
aspect
to704:272
figuring this would guarantee that the video would not be stretched innapropriately in either direction - re-encode the video to
h264
format usinglibx264
(original does not matter) - I set the "constant rate factor" (
crf
) to 17, which should make the output indistinguishable from the source material (crf
of 0 is technically lossless, but you would get huge file output sizes) -c:a copy
means copy the audio and do nothing to do it...-profile:v high
this is ah264
specific encoding parameter, that restricts which filters are applied to the video. Without this argument, the video is compatible with the largest variety of devices, but virtually all modern phones, tablets, video-players, etc support thehigh
profile setting, so this allows for slightly smaller file-sizes as a result-preset medium
We discussed, the preset settings have to do with encoding, the idea is that for a slower preset, the quality of the result will be higher for the same filesize. This should be set to the slowest setting you can tolerate, oftenfast
orveryfast
are used.-tune <parameter>
more can be seen about these in the ffmpeg wiki documentation for x264 encoding.-level 4.1
this is anotherx264
setting, by specifying the level, I reduce compatability of the output video, however this setting is compatible with virtually all my devices, chromecasts, raspberry pi's etc...-movflags +faststart
this setting is only applicable with some output video encodings, but the idea here is that video/audio index in the very beginning of the file (not compatible with all applications, but offers significant benefits such as videos being able to start right away).
While this takes care of the video, I still have problems with the subtitles, while I'm sure there was a way I can deal with this with ffmpeg, I elected to use tools such as mkvtools
, tesseract
(which is usable with ffmpeg) and vobsub2srt
.
First, I identify subtitles
ognyan@wfh:~$ mkvinfo some_movies.mkv
| + Track
| + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
| + Track UID: 3
| + Track type: subtitles
| + Enabled: 1
| + Default track flag: 1
| + Forced track flag: 0
| + Lacing flag: 0
| + Minimum cache: 0
| + Maximum block additional ID: 0
| + Codec ID: S_VOBSUB
| + Codec decode all: 1
| + Language: eng
| + Codec's private data: size 508
| + Track
| + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
| + Track UID: 4
| + Track type: subtitles
| + Enabled: 1
| + Default track flag: 0
| + Forced track flag: 0
| + Lacing flag: 0
| + Minimum cache: 0
| + Maximum block additional ID: 0
| + Codec ID: S_VOBSUB
| + Codec decode all: 1
| + Language: fre
| + Codec's private data: size 508
| + Track
| + Track number: 5 (track ID for mkvmerge & mkvextract: 4)
| + Track UID: 5
| + Track type: subtitles
| + Enabled: 1
| + Default track flag: 0
| + Forced track flag: 0
| + Lacing flag: 0
| + Minimum cache: 0
| + Maximum block additional ID: 0
| + Codec ID: S_VOBSUB
| + Codec decode all: 1
| + Language: spa
| + Codec's private data: size 508
I can see trakcs, 3, 4, and 5 are subtitle tracks... I then extract them
ognyan@wfh:~$ mkvextract tracks \
some_movie.mkv \
3:some_movie.eng.srt \
4:some_movie.fre.srt \
5:some_movie.spa.srt
Here, I had to manually install a tool that's not part of the homebrew distribution:
ognyan@wfh:~$ brew install --with-all-languages tesseract
brew install --HEAD https://github.com/ruediger/VobSub2SRT/raw/master/packaging/vobsub2srt.rb
I then create my srt
subtitles:
ognyan@wfh:~$ vobsub2srt some_movie.eng
vobsub2srt some_movie.fre
vobsub2srt some_movie.spa
ass
is considered the most versatile/compatible format, so I use ffmpeg to convert the subtitles from srt
to ass
ognyan@wfh:~$ ffmpeg -i some_movie.eng.srt some_movie.eng.ass
I then add the subtitle files back into the video:
ognyan@Wfh:~$ ffmpeg -i some_movie.mkv -i some_movie.eng.ass \
-codec copy \
-map 0 \
-map 1 \
-metadata:s:s:0 language=eng \
output.mkv
ffmpeg
has cuda support for video transcoding and some filters... I won't go into specific use-cases but you can genereally transcode from most video formats, to either h264
or h265
... in addition the cuda support does support some video filters such as scaling.
the hardware supported ffmpeg offers significant performance improvements (testing some videos I was transcoding at almost 200x playback speed vs. 2-5x).