Doing a big big big masscan and grabbing headers, currently have targets in mind for a project but wanted to find a way to explore the other stuff active on the same ports. Used this deeply terrible one-liner to split up the HTTP banners into tokens and then count token frequency.
fgrep -i http masscan.json | sed 's/[,]$//'\
| jq -s ".[].ports[].service.banner" | sed 's/[";:,<>()]//g'\
| sed "s/[']//g" | sed -E 's/([\\]r|[\\]n)+/ /g'\
| sed 's/[\/=]/ /g' | awk '{ for (i=1; i<=NF; i++) { print $i}}'\
| tr '[:upper:]' '[:lower:]' | grep -E '^.{4,}$'\