Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save willf/422550 to your computer and use it in GitHub Desktop.
Save willf/422550 to your computer and use it in GitHub Desktop.
command line .. look for all LI or TRs in a document
need to change the DTD declarations to point to a local copy of the W3 DTD
locally:
> ls *.dtd *.ent
xhtml1-transitional.dtd xhtml-lat1.ent xhtml-special.ent xhtml-symbol.ent
change:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" dir="ltr">
using the 'Saxon HE' library:
java -cp saxon.jar net.sf.saxon.Query -s List_of_famous_elephants -qs:"declare default element namespace \"http://www.w3.org/1999/xhtml\";//li" | perl -pe "s/\n/ /g" | perl -pe "s|<li|\n<li|g" | perl -pe "s|</?[^>]+>||g" | head -10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment