Skip to content

Instantly share code, notes, and snippets.

@timrobertson100
Created September 23, 2014 13:27
Show Gist options
  • Save timrobertson100/58bd652a418e6c9691a6 to your computer and use it in GitHub Desktop.
Save timrobertson100/58bd652a418e6c9691a6 to your computer and use it in GitHub Desktop.
1) get tab file (runtime 2 mins):
$ hadoop dfs -getmerge /user/hive/warehouse/tim.db/occurrence_tab occurrence.txt
-> problem #1. 5.4GB file just pulled of hadoop
2) zip the file on the local filesystem (runtime 90 secs)
$ zip local.zip occurrence_tab.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment