Skip to content

Instantly share code, notes, and snippets.

@Tomfox91
Created September 10, 2014 13:13
Show Gist options
  • Save Tomfox91/bc1d7f19d07d659e0c54 to your computer and use it in GitHub Desktop.
Save Tomfox91/bc1d7f19d07d659e0c54 to your computer and use it in GitHub Desktop.
Quick-and-dirty converter from HTML table to Tab-delimited flat-file. Uses sed and bash. Converts the first table in a HTML document to a Tab-delimited file. Usage: `./tableConverter.sh < file.html`
s![ \t]*<!<!g
s!>[ \t]*!>!g
s!<tr>!\
!g
s!</t[dh]><t[dh]>! !g
s!<table>\n!!g
s!</table.*!!g
s!<[^>]+>!!g
#!/bin/bash
cd "$(dirname "$0")"
tr -d '\n' \
| sed -E $'s!</?(table|t[rdh])!\\\n&!g' \
| sed -Enf trim.sed \
| tr -d '\n' \
| sed -Ef format.sed
/^<table/, /<\/table>/ {
s!<(/?(table|t[rdh]))[^>]*>!<#\1>!g
s!<[^#>][^>]*>!!g
s!<#!<!g
p
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment