Scope: Transfer data to NYU's high performance computing (HPC) cluster.
Summary of access nodes on "Greene":
Fully-qualified domain name (FQDN) | Purpose |
---|---|
gdtn.hpc.nyu.edu | Data transfer node (DTN) |
greene.hpc.nyu.edu | Login node |
Summary of data storage on "Greene" (per netID):
Path | Environmental Variable | Purpose | Flushed? | Allocation |
---|---|---|---|---|
/archive/$USER | $ARCHIVE | Long-term storage | No | 2TB/20K files |
/home/$USER | $HOME | Small files, code | No | 50GB/30K files |
/scratch/$USER | $SCRATCH | File staging -freq. read/write | Yes. Files unused for sixty (60) days are deleted | 5TB/1M files |
You'll need these:
- An active NetID with permission to access HPC resources.
- VPN client if connecting from off-campus.
- An understanding of absolute vs. relative paths.
Quickly transfer a small number of file(s) or directory from source to destination:
scp -rv file1 [email protected]:/home/netID
(when prompted, provide your NetID credentials. A successful transfer will yield: Exit status 0
)
Sync the content of a directory(s) from source to destination (recommended for large # of files):
rsync --archive --compress --progress --exclude=".*" directoryname/ [email protected]:/scratch/netID/directoryname
or more succinctly:
rysnc -azP --exclude=".*" directoryname/ [email protected]:/scratch/netID/directoryname
(when prompted, provide your NetID credentials.)
rsync notes:
- The trailing
/
on (directoryname/
) in your source matters. --exclude=".*"
excludes dot.
files. This is optional, but recommended.- Run the
rsync
command again when your source directory changes to have those changes reflected in your destination directory. rsync
can pick up where it left off with the:--append
option.
Use tar to compress source directory and push it over SSH to destination (recommended for fast data transfer and/or archiving):
tar --create --gzip --verbose --file - directoryname |ssh [email protected] "cat > /archive/netid/tarballname.tar.gz"
or more succinctly:
tar czvf - directoryname | ssh [email protected] "cat > /archive/netid/tarballname.tgz"
To unpack the tar (on the destination side) do:
tar --extract --verbose --gunzip --file tarballname.tgz
or less verbosely:
tar -xvzf tarballname.tgz
To keep a process running even after exiting your shell or terminal, preface it with: nohup
(recommended for time-intensive jobs):
nohup tar -czvf - directoryname | ssh [email protected] "cat > /scratch/netid/tarballname.tgz"
then stop the command:
Ctrl + z
and send it to background:
bg
To monitor this process, take note of its PID:
ps aux |grep -i ssh
which will yield, e.g.: 67225
then take that PID, and feed it to top
:
top -pid 67225
or: top -p 67225
(you can quit top
with: q
).
❤️