Skip to content

Instantly share code, notes, and snippets.

View bitsgalore's full-sized avatar

Johan van der Knijff bitsgalore

View GitHub Profile
@bitsgalore
bitsgalore / packageEpub.md
Last active August 28, 2020 14:57
How to package an epub file using InfoZip

Background

An EPUB is just a ZIP container, but using a ZIP tool directly on a directory with content documents won't usually result in a valid EPUB. This is because the standard requires that:

  1. The mimetype resource must appear as the first file in the container
  2. The mimetype resource must be uncompressed

So to meet these requirements we must ZIP the files in a special way. This gist describes how to do this with InfoZip (which is the default ZIP tool on most Linux systems).

Let's suppose all content files are in a directory called /home/johan/epubPolicyTests/content/epub20_minimal/.

<?xml version="1.0"?>
<!--
Schematron jpylyzer schema: verify if JP2 conforms to
KB's profile for access copies (A.K.A. KB_ACCESS_LOSSY_01/07/2014)
Simplified version for Geheugen Van Nederland migration, omits specific requirements for
resolution, colour space,compression ratio, XML box and codestream comment.
Johan van der Knijff, KB / National Library of the Netherlands , 18 March 2014.
-->
<s:schema xmlns:s="http://purl.oclc.org/dsdl/schematron">
@bitsgalore
bitsgalore / compratio.sh
Created April 8, 2015 16:18
Compute compression ratio for all JP2s in directory tree
#!/bin/bash
# Compute compression ratio of each JP2 in directory tree, report results to CSV file
# Requires:
# - jpylyzer
# - xmllint (part of libxml library)
#
# If you're using Windows you can run this shell script within a Cygwin terminal: http://www.cygwin.com/
#
# Installation directory
@bitsgalore
bitsgalore / extractlayers.sh
Created April 9, 2015 11:25
For each JP2 image in a directory, generate derived image that discards user-defined number of quality layers. Requires Aware j2kdriver tool.
#!/bin/bash
# Generate derived JP2s that discard user-specified number of quality layers from source JP2s
# Requires:
# - j2kdriver (Aware)
#
# If you're using Windows you can run this shell script within a Cygwin terminal: http://www.cygwin.com/
#
# Installation directory
instDir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
@bitsgalore
bitsgalore / histogram.plt
Last active September 22, 2016 12:07
Gnuplot histogram example
#!/gnuplot
#
# Histogram of compression ratio distribution
#
set terminal postscript enhanced landscape
set output "histogram.ps"
set size ratio 0.5
set key top right
set xlabel "Compression ratio"
set ylabel "Number of files"
@bitsgalore
bitsgalore / extensionsKBDM.md
Last active August 29, 2015 14:19
50 most prevalent formats in KB e-Depot by file extension, based on March 2014 count. Use scrollbar at bottom to display remarks column to the right.
Extension Number of files in e-Depot ID(s) Tika Remarks
gif 34499095 - image/gif GIF image
xml 12913388 - application/xml XML (mostly metadata)
jpg 8197415 N/A* JPEG image
sml 7744829 - image/gif GIF image with unusual extension
pdf 7577414 - application/pdf PDF
raw 2045662 - text/plain Text file
tif 715509 - image/tiff TIFF image
oa3 296101 - text/plain Looks like SGML (oases, Kluwer). See also: Publisher Data Formats. Metadata.
@bitsgalore
bitsgalore / zipfilesKBDM.md
Last active August 29, 2015 14:19
File formats inside ZIP files (based on 22 ZIP files only!)
@bitsgalore
bitsgalore / softwareReadingRooms.md
Created April 23, 2015 16:43
Rendering of top 50 formats in KB reading rooms
Category Rendering software in reading rooms Formats accessible in reading rooms?
Image formats MS Paint, Windows Photoviewer Yes
PDF Adobe Acrobat Yes
Web formats Internet Explorer, Google Chrome Yes
Office formats Microsoft Office Yes (support for old Office formats presently not clear)
Audio Windows Media Player, VLC Media Player No (hardware in reading rooms doesn't support audio)
Video Windows Media Player, VLC Media Player Partially (hardware in reading rooms doesn't support audio)
Metadata Internet Explorer, Notepad, Wordpad Yes
Executables, installers, system files Not applicable No