-
-
Save bancek/b37b780292540ed2d17d to your computer and use it in GitHub Desktop.
cue_file = 'file.cue' | |
d = open(cue_file).read().splitlines() | |
general = {} | |
tracks = [] | |
current_file = None | |
for line in d: | |
if line.startswith('REM GENRE '): | |
general['genre'] = ' '.join(line.split(' ')[2:]) | |
if line.startswith('REM DATE '): | |
general['date'] = ' '.join(line.split(' ')[2:]) | |
if line.startswith('PERFORMER '): | |
general['artist'] = ' '.join(line.split(' ')[1:]).replace('"', '') | |
if line.startswith('TITLE '): | |
general['album'] = ' '.join(line.split(' ')[1:]).replace('"', '') | |
if line.startswith('FILE '): | |
current_file = ' '.join(line.split(' ')[1:-1]).replace('"', '') | |
if line.startswith(' TRACK '): | |
track = general.copy() | |
track['track'] = int(line.strip().split(' ')[1], 10) | |
tracks.append(track) | |
if line.startswith(' TITLE '): | |
tracks[-1]['title'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '') | |
if line.startswith(' PERFORMER '): | |
tracks[-1]['artist'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '') | |
if line.startswith(' INDEX 01 '): | |
t = map(int, ' '.join(line.strip().split(' ')[2:]).replace('"', '').split(':')) | |
tracks[-1]['start'] = 60 * t[0] + t[1] + t[2] / 100.0 | |
for i in range(len(tracks)): | |
if i != len(tracks) - 1: | |
tracks[i]['duration'] = tracks[i + 1]['start'] - tracks[i]['start'] | |
for track in tracks: | |
metadata = { | |
'artist': track['artist'], | |
'title': track['title'], | |
'album': track['album'], | |
'track': str(track['track']) + '/' + str(len(tracks)) | |
} | |
if 'genre' in track: | |
metadata['genre'] = track['genre'] | |
if 'date' in track: | |
metadata['date'] = track['date'] | |
cmd = 'ffmpeg' | |
cmd += ' -b:a 320k' | |
cmd += ' -i "%s"' % current_file | |
cmd += ' -ss %.2d:%.2d:%.2d' % (track['start'] / 60 / 60, track['start'] / 60 % 60, int(track['start'] % 60)) | |
if 'duration' in track: | |
cmd += ' -t %.2d:%.2d:%.2d' % (track['duration'] / 60 / 60, track['duration'] / 60 % 60, int(track['duration'] % 60)) | |
cmd += ' ' + ' '.join('-metadata %s="%s"' % (k, v) for (k, v) in metadata.items()) | |
cmd += ' "%.2d - %s - %s.mp3"' % (track['track'], track['artist'], track['title']) | |
print cmd |
Great script, as mentioned above, just move row 55 to row 62 and it will be fine.
Thanks
-ab 320k for new ffmpeg versions
cmd += ' ' + ' '.join('-metadata %s="%s"' % (k, v) for (k, v) in metadata.items())
cmd += ' -ab 320k'
cmd += ' "%.2d - %s - %s.mp3"' % (track['track'], track['artist'], track['title'])
Something is going on with track length. When I slidely change the script to split into flac files, all files have a length info of the original length from the combined file length showing, causing players to print an error on the end of each file. Not sure if it is related to some missing parts in the script here, or caused by the fact that I try to split the flac without re-encoding (-c:a copy
) which can cause known issues with flac frames ... EDIT: yep, it is.
Great! Exactly what I was looking for. Thanks!
Python 3:
Line 34 to:
t = list(map(int, ' '.join(line.strip().split(' ')[2:]).replace('"', '').split(':')))
Line change 65 to:
print(cmd)
So Python 3 and new ffmpeg version:
cue_file = 'file.cue'
d = open(cue_file).read().splitlines()
general = {}
tracks = []
current_file = None
for line in d:
if line.startswith('REM GENRE '):
general['genre'] = ' '.join(line.split(' ')[2:])
if line.startswith('REM DATE '):
general['date'] = ' '.join(line.split(' ')[2:])
if line.startswith('PERFORMER '):
general['artist'] = ' '.join(line.split(' ')[1:]).replace('"', '')
if line.startswith('TITLE '):
general['album'] = ' '.join(line.split(' ')[1:]).replace('"', '')
if line.startswith('FILE '):
current_file = ' '.join(line.split(' ')[1:-1]).replace('"', '')
if line.startswith(' TRACK '):
track = general.copy()
track['track'] = int(line.strip().split(' ')[1], 10)
tracks.append(track)
if line.startswith(' TITLE '):
tracks[-1]['title'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '')
if line.startswith(' PERFORMER '):
tracks[-1]['artist'] = ' '.join(line.strip().split(' ')[1:]).replace('"', '')
if line.startswith(' INDEX 01 '):
t = list(map(int, ' '.join(line.strip().split(' ')[2:]).replace('"', '').split(':')))
tracks[-1]['start'] = 60 * t[0] + t[1] + t[2] / 100.0
for i in range(len(tracks)):
if i != len(tracks) - 1:
tracks[i]['duration'] = tracks[i + 1]['start'] - tracks[i]['start']
for track in tracks:
metadata = {
'artist': track['artist'],
'title': track['title'],
'album': track['album'],
'track': str(track['track']) + '/' + str(len(tracks))
}
if 'genre' in track:
metadata['genre'] = track['genre']
if 'date' in track:
metadata['date'] = track['date']
cmd = 'ffmpeg'
cmd += ' -i "%s"' % current_file
cmd += ' -ss %.2d:%.2d:%.2d' % (track['start'] / 60 / 60, track['start'] / 60 % 60, int(track['start'] % 60))
if 'duration' in track:
cmd += ' -t %.2d:%.2d:%.2d' % (track['duration'] / 60 / 60, track['duration'] / 60 % 60, int(track['duration'] % 60))
cmd += ' ' + ' '.join('-metadata %s="%s"' % (k, v) for (k, v) in metadata.items())
cmd += ' -b:a 320k'
cmd += ' "%.2d - %s - %s.mp3"' % (track['track'], track['artist'], track['title'])
print(cmd)
...and i suggest to change line 55:
cmd += ' -b:a 320k'
to:
cmd += ' -c:a copy'
to skip the reencoding part
Kinda weird that nobody seems to have noticed that the files split with this script have wrong start time and duration. Granted, the errors are all smaller than a second, but still pretty noticeable on the second track onward.
The issue is with lines 57 and 60. It uses integer for seconds, instead of using float with two decimals for precision.
Line 57 should read:
cmd += ' -ss %.2d:%.2d:%05.2f' % (track['start'] / 60 / 60, track['start'] / 60 % 60, track['start'] % 60)
And line 60 should read:
cmd += ' -t %.2d:%.2d:%05.2f' % (track['duration'] / 60 / 60, track['duration'] / 60 % 60, track['duration'] % 60)
With those changes, no more short tracks, nor ones that start to early!
This code does a pretty good job. Thanks.
However getting the cuts to the millisecond needs some more work!
The problem is standard cue file Index points are specified in MM:SS:FF format, where FF are frames.
And ffmpeg wants fractions of a second to make the cuts.
Also If we want to avoid re-encoding, which is sensible, ffmpeg has to cut at frame boundaries, which it is cautious about, so adds a couple of frames to ensure nothing is excluded. (Typically .026 secs a go for mp3).
If the cue file was designed for CD rather than an MP3 file, which is usual, then each FF is 1/75 sec, so the calculation to get ms from FF is easy, but the problem with ffmpeg remains.
If you want to get this spot on, the frame size in ms will need to be calculated (The typical MP3 (Layer III, version 1) has 1152 samples per frame and the sample rate is (commonly) 44100 hz.) and all valid audio frames will have to read and written 1 by 1 to the desired duration.
Alternatively mp3directcut (windows free) will read a cue file, and split the audio without reencoding, and works to the frame level, but I have never checked exactly how accurate this is. There may be better tools. I'd love to know.
Alright, following @holesocks advise (Thanks!), I've forked this gist, see here, and made the following changes.
- fixed location of the bitrate parameter.
- Support both Python 2 & 3.
- Fixed track duration so it does not cuts tracks short, nor starts them early (for the usual case of CD-Images as .flac files at least).
I've kept the changes to the minimum, so its easy to compare to the original (and anyone can use it as a base).
I'll probably rewrite an over-engineered version (call ffmpeg, flac-to-flac splits, selectable output format. error checking, etc) just to exercise a bit my rusty fingers.
Thanks!
ffmpeg will work out what output to produce going by the filename extension. Your program could split aac and wav files too (don't know about flac) with very few changes, Just an idea!
Generally mp3's are just not designed to be cut at the frame level - data can overflow from one frame to the next for one. ffmpeg probably tidies up the ends as best it can to avoid audible imperfections, but at the expense of a little loss of precision.
According to the hydrogenaud.io specialists, pcutmp3 is the best tool to cut mp3s that will deal with overflow and gapless play. It is a java program and it is unclear if it is still supported so I didn't test it.
That's me done - cheerio.
@holesocks: that's the idea! The script I have in progress it's called "cue_splitter.py", and it let's you select format/codec/bitrate/etc... albeit personally will only use it to do .flac to .flac splitting (particularly due to your comments regarding frame-level splitting).
Using ffmpeg you can do splitting without re-conversion, but there's a bug in ffmpeg, and the split files end up all having the right size, but the wrong duration in them (and tend to confuse some media players). I just resort to "flac 2 flac" with the default compression level (fast enough even on my old CPU) and files work ok.
I've intentionally kept this gist as close to the original as possible (while fixing the most glaring errors), because maybe other fellows can do like me... and use it for practicing their programming with a simple, but concrete project.
Thanks for your feedback, and greetings from Argentina! :-)
Thank you 🙏
I made this FFmpeg based command line utility https://github.com/jeanslack/FFcuesplitter, it has some interesting options and is flexible enough for most needs, the results seem accurate.
The bitrate option (l. 55) is set too early, it should be just before output file, otherwise great script 👍