Scope: Transfer data to NYU's high performance computing (HPC) cluster.
Summary of access nodes on "Greene":
| Fully-qualified domain name (FQDN) | Purpose |
|---|---|
| gdtn.hpc.nyu.edu | Data transfer node (DTN) |
| greene.hpc.nyu.edu | Login node |
Summary of data storage on "Greene" (per netID):
| Path | Environmental Variable | Purpose | Flushed? | Allocation |
|---|---|---|---|---|
| /archive/$USER | $ARCHIVE | Long-term storage | No | 2TB/20K files |
| /home/$USER | $HOME | Small files, code | No | 50GB/30K files |
| /scratch/$USER | $SCRATCH | File staging -freq. read/write | Yes. Files unused for sixty (60) days are deleted | 5TB/1M files |
You'll need these:
- An active NetID with permission to access HPC resources.
- VPN client if connecting from off-campus.
- An understanding of absolute vs. relative paths.
Quickly transfer a small number of file(s) or directory from source to destination:
scp -rv file1 [email protected]:/home/netID
(when prompted, provide your NetID credentials. A successful transfer will yield: Exit status 0)
Sync the content of a directory(s) from source to destination (recommended for large # of files):
rsync --archive --compress --progress --exclude=".*" directoryname/ [email protected]:/scratch/netID/directoryname
or more succinctly:
rysnc -azP --exclude=".*" directoryname/ [email protected]:/scratch/netID/directoryname
(when prompted, provide your NetID credentials.)
rsync notes:
- The trailing
/on (directoryname/) in your source matters. --exclude=".*"excludes dot.files. This is optional, but recommended.- Run the
rsynccommand again when your source directory changes to have those changes reflected in your destination directory. rsynccan pick up where it left off with the:--appendoption.
Use tar to compress source directory and push it over SSH to destination (recommended for fast data transfer and/or archiving):
tar --create --gzip --verbose --file - directoryname |ssh [email protected] "cat > /archive/netid/tarballname.tar.gz"
or more succinctly:
tar czvf - directoryname | ssh [email protected] "cat > /archive/netid/tarballname.tgz"
To unpack the tar (on the destination side) do:
tar --extract --verbose --gunzip --file tarballname.tgz
or less verbosely:
tar -xvzf tarballname.tgz
To keep a process running even after exiting your shell or terminal, preface it with: nohup (recommended for time-intensive jobs):
nohup tar -czvf - directoryname | ssh [email protected] "cat > /scratch/netid/tarballname.tgz"
then stop the command:
Ctrl + z
and send it to background:
bg
To monitor this process, take note of its PID:
ps aux |grep -i ssh
which will yield, e.g.: 67225
then take that PID, and feed it to top:
top -pid 67225 or: top -p 67225
(you can quit top with: q).
β€οΈ