Created
September 10, 2014 13:13
-
-
Save Tomfox91/bc1d7f19d07d659e0c54 to your computer and use it in GitHub Desktop.
Quick-and-dirty converter from HTML table to Tab-delimited flat-file. Uses sed and bash. Converts the first table in a HTML document to a Tab-delimited file. Usage: `./tableConverter.sh < file.html`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
s![ \t]*<!<!g | |
s!>[ \t]*!>!g | |
s!<tr>!\ | |
!g | |
s!</t[dh]><t[dh]>! !g | |
s!<table>\n!!g | |
s!</table.*!!g | |
s!<[^>]+>!!g |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
cd "$(dirname "$0")" | |
tr -d '\n' \ | |
| sed -E $'s!</?(table|t[rdh])!\\\n&!g' \ | |
| sed -Enf trim.sed \ | |
| tr -d '\n' \ | |
| sed -Ef format.sed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/^<table/, /<\/table>/ { | |
s!<(/?(table|t[rdh]))[^>]*>!<#\1>!g | |
s!<[^#>][^>]*>!!g | |
s!<#!<!g | |
p | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment