Skip to content

Instantly share code, notes, and snippets.

@cmarat
Last active August 29, 2015 14:03
Show Gist options
  • Save cmarat/4e95adba5dab06b03119 to your computer and use it in GitHub Desktop.
Save cmarat/4e95adba5dab06b03119 to your computer and use it in GitHub Desktop.
Clean and sort sameAs links from dbpedia to external resources
bzcat downloads.dbpedia.org/3.9/links/*nt.bz2 | grep '<http://www.w3.org/2002/07/owl#sameAs>' | awk '{if ( match($1, "<http://dbpedia.org/")==1) {print $1 " " $2 " " $3 " ."} else {print $3 " " $2 " " $1 " ."} }' | sort | bzip2 > sorted_outbound_sameas.nt.bz2 &
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment