Last active
November 30, 2023 15:31
-
-
Save Jorricks/1bed0f677b0630578a67837a7084201c to your computer and use it in GitHub Desktop.
Downloading files from HDFS through zipping and Jupyterhub
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Check the files are as expected | |
!hdfs dfs -ls /user/jorrick/my_file_path/ | |
# Setup | |
!mkdir /tmp/jorrick | |
# Create the subdirectory and make sure it has the correct permissions | |
!mkdir -m 700 /tmp/jorrick/my_file_path | |
!ls -ll /tmp/jorrick/my_file_path | |
# Copy the files over to local file system | |
!hdfs dfs -copyToLocal /user/jorrick/my_file_path/ /tmp/jorrick/my_file_path | |
# Zip everything, make sure you use enough stars :) | |
!zip /tmp/jorrick/my_file_path.zip /tmp/jorrick/my_file_path/* | |
!zip /tmp/jorrick/my_file_path.zip /tmp/jorrick/my_file_path/*/* | |
!zip /tmp/jorrick/my_file_path.zip /tmp/jorrick/my_file_path/*/*/* | |
# Verify your zip included the files by checking the file size | |
!ls -ll /tmp/jorrick/my_file_path.zip | |
# Upload the file back onto HDFS so we can then download it through the Jupyterhub interface | |
!hdfs dfs -put /tmp/jorrick/my_file_path.zip /user/jorrick/notebooks/ | |
# Delete the tmp folder | |
!rm -rf /tmp/jorrick |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment