Subset a list of files from NCI and transfer them to Pawsey. The directory structure needs to be preserved from NCI to Pawsey. The subset is a few hundres TB and needs to be publicly accessable.
You can't use pshell in the following way.
pshell cp <local file> <remote location>
instead you have to run pshell in an interactive way. you launch it and run it as a programme.
You can't use pshell in the following way
pshell put my_local_file.txt /a/b/c/
where /a/b/c/
are some remote directory
Instead you must
pshell cd /a/b/c/ && put my_local_file
You can't use pshell in the following way
pshell cd <remote location>
pshell put <local file>
The second line will cause pshell to forget where you changed directory to and will put your local file into the root of your project directory
But you can chain commands together
pshell cd <remote location> && put <local file>
You can't push a file to a remote directory if the directory doesn't exist.
You CAN make a new directory, but you CAN'T check check to see if the directory allready exists. And you can't ask pshell to create the parent directory such as mkdir -p
like you can in bash.
pshell mkdir /a/b/c/d && cd /a/b/c/d && put <local file>
will fail if a b or c do not exist and will fail if d allready exists. But you CAN'T check to see if they do or not.
You CAN put
a top level directory and it WILL preserve the directory structure. However, a) if running from the NCI it will try and push the entire directory which includes all of the archive and not the sub-set we require. b) if we run it from Landgate, we need to subset the data and cache it locally. b) we download the entire sub-set wich is 100s of TB or we have to pull a few files at a time, push them, then pull a few more. This meas we need to keep track of all the files that we pulled to make sure we get them all and don't duplicate the downloads. This would be fine if the downloads didn't fail all the time. Because the fail we need to do a lot of checking and error handling from the command line.
You can't just publish the top level directory. You must publish each file seperately
pshell publish <root directory>
will fail.
You must
pshell mkdir /a/b/c/d && cd /a/b/c/d && put <local_file> && publish /a/b/c/d/<local_file>
Pawsey impliments the following functionality
pshell cp <local file> <remote folder>
AND
pshell mkdir -p
AND
pshell publish -r <root direcoty>
Pawsey transfers often fail. because pshell needs to be wrapped in a bash scrpt and doesn't do any sane eorror handling, if the transer fails, the shell script keeps executing and i can't tell if the transfer was successful or not.
pshell should use stdio and stderr
Unzip pshell, monkey patch pshell to wrap mkdir command in a try: catch: except: pass
block.
zip the patched __main__.py
and __mf_client__.py
file into pshell.zip
Then execute the folling command
$ echo '#!/usr/bin/env python' | cat - pshell.zip > pshell
and run the following script
for str in $(cat ./file_list.txt);do var=$(echo $str | sed 's,/g/data3/fj7/Copernicus/,,g' | awk 'BEGIN{FS="/"; strA=""; strB"";}{ for (i=1;i<NF;i++){echo $i; strA=strA" mkdir "strB""$i" && " ; strB=strB""$i"/";}}END{print strA, " cd /projects/WACopernicus/"strB , " && put "}' | xargs -I% echo "cd /projects/WACopernicus && " % $str " && publish " $(echo $str | sed 's,/g/data3/fj7/Copernicus/,/projects/WACopernicus/,g') | xargs -I% echo \"%\" ); echo "python pshell" $var;done > bigList.txt
This will generate a file bigList.txt
which contains the reuqired pshell commands