Skip to content

Instantly share code, notes, and snippets.

@nicola
Last active June 17, 2017 06:53
Show Gist options
  • Save nicola/07afb390a6c510ea7b6f to your computer and use it in GitHub Desktop.
Save nicola/07afb390a6c510ea7b6f to your computer and use it in GitHub Desktop.
Regexp to transform Facebook messages from the archive into JSON
cat messages.htm \
| iconv -f utf8 -t ascii//TRANSLIT//IGNORE \
| sed "s/['\`]//g" \
| tr '\n' ' ' \
| sed 's/<div class="thread">\([a-zA-Z0-9,\.\ &#;-]*\)\(<div class="message">\)/\
{ thread: @ESCAPE0x1@\1@ESCAPE0x1@, messages:[\2/g' \
| sed 's/<div class="message"><div class="message_header"><span class="user">\([a-zA-Z0-9,\.\ &#;-]*\)<\/span>/, {from:@ESCAPE0x1@\1@ESCAPE0x1@,/g' \
| sed 's/<span class="meta">\([a-zA-Z0-9\ ,:+-]*\)<\/span><\/div><\/div>/ date:@ESCAPE0x1@\1@ESCAPE0x1@,/g' \
| sed 's/<p>/message:@ESCAPE0x1@/g' \
| sed 's/<\/p>/@ESCAPE0x1@}/g' \
| sed 's#</div></div></div><div class=\"footer\">.*#\]#g' \
| sed 's/"/\\"/g' \
| sed 's/@ESCAPE0x1@/"/g' \
| sed 's/<\/div>/\]},/g' \
| sed 's/", messages:\[, {/", messages:\[{/g' > convs.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment