Skip to content

Instantly share code, notes, and snippets.

@omas
Last active August 29, 2015 14:02
Show Gist options
  • Select an option

  • Save omas/9b2468d24750e13433ef to your computer and use it in GitHub Desktop.

Select an option

Save omas/9b2468d24750e13433ef to your computer and use it in GitHub Desktop.
html の タグを除去
#!/bin/bash
cat $1 \
| sed -e "s/\r//g" \
| tr "\n" "@" \
| sed -e "s/<script>[^<]*<\/script>//g" \
| sed -e "s/<style>[^<]*<\/style>//g" \
| sed -e "s/<!--[^<]*-->//g" \
| sed -e "s/&nbsp;//g" \
| sed -e "s/&gt;/>/g" \
| sed -e "s/&lt;/</g" \
| sed -e "s/<[^>]*>//g" \
| sed -e "s/[ \t]*//g" \
| sed -e "s/@@*/@/g" \
| tr "@" "\n"
@omas
Copy link
Copy Markdown
Author

omas commented Jun 17, 2014

html のタグと空白を削除

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment