Skip to content

Instantly share code, notes, and snippets.

@thsutton
Created September 23, 2013 11:28
Show Gist options
  • Save thsutton/6669249 to your computer and use it in GitHub Desktop.
Save thsutton/6669249 to your computer and use it in GitHub Desktop.
Shell script to fix the "unique" identifiers in EPUB files sold by Packt Publishing.
#!/bin/sh
#
# fixpacktepubs.sh
#
# Process the EPUB files in the current directory and recreate them with the
# ISBN as the unique value. This script was written because Packt Publishing
# sell EPUBs with non-unique unique identifiers and it assumes that the EPUBs
# being process are structured like Packt's.
#
mkdir -p "new"
find . -type f -maxdepth 1 -iname '*.epub' | while read epub; do
# Determine the "name" of the EPUB
name=$(basename -s ".epub" "${epub}")
newname="../new/${name}.epub"
rm -rf "${name}"
mkdir -p "${name}"
pushd "${name}" > /dev/null
# Unpack the contents of the EPUB file.
unzip -q "../${epub}" > /dev/null
# Find the ISBN.
isbn=$(grep -ri 'ISBN' . | sed -Ee 's/.*(ISBN ([0-9-]+)).*/urn:isbn:\2/g' -e 's/-//g')
# Replace the probably-duplicate value set as the unique identifier with
# more-likely-to-be-unique ISBN.
find . -type f -iname '*.opf' -print | while read content; do
unique=$(grep unique-identifier "${content}" | sed -Ee "s/.*unique-identifier=['\"](.*)[\"'].*/\1/")
sed -i~ -Ee "s#id=\"${unique}\">.*</#id=\"${unique}\">${isbn}</#g" "${content}"
rm "${content}~"
done
# Reconstruct it.
zip -X "${newname}" mimetype
zip -rg "${newname}" META-INF -x \*.DS_Store
zip -rg "${newname}" OEBPS -x \*.DS_Store
popd > /dev/null
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment