NewRiver Decommission

The NewRiver cluster has served as the medium-scale, homogeneous architecture HPC cluster at Virginia Tech since its release in 2015. Predominantly a CPU-compute and big-data oriented cluster to start, it was extended in 2016 with 40 nodes equipped with dual NVIDIA P100 GPU accelerators.

Tinkercliffs is now ARC's flagship CPU-based HPC cluster with nearly three times as many nodes which also have roughly four times the core density.

For GPU-based workloads, Cascades provides 40 nodes equipped with dual NVIDIA V100 GPUs, Infer provides 20 NVIDIA T4 nodes and Newriver's P100 nodes will be added to it. In summer-fall 2021 a new dense GPU cluster featuring 32 NVIDIA A100 GPUs will also become available.

Timeline for decommission

Newriver has been a highly productive resource for researchers and students. The Spring 2021 semester is the final semester during which the cluster will be available. If you still have computational workflows which need to run on Newriver, use this semester to complete all work.

All of Newriver's storage systems are also available on Cascades, Dragonstooth, and Huckleberry systems, so there is no immediate need to migrate data to other locations. However, the migration of workloads and data is a natural time to review data organization and create data lifecycle plans, or reevaluate any data lifecycle plans currently in place. You can find recommendations on how to proceed below.

Here is the scheduled timeline:

  • February 1, 2021 - End of updates, no additional software installations, limited support for jobs/scheduler/nodes.
  • March-April 2021 - P100 GPU nodes are migrated to the Infer cluster
  • May 31 - End of compute. All jobs must end on/before this date.

Migrating workloads

Depending on the type of workload, the Tinkercliffs, Infer, Cascades and Dragonstooth clusters present viable options for migration.

Newriver is ARC's last remaining PBS managed cluster. All other clusters use the SLURM scheduler/resource manager, so job scripts and scheduler interaction will be slightly different. This webpage provides some information about migrating from PBS to Slurm on ARC clusters.

The destinations all differ slightly in configuration and features, so here is a highlight considerations for each option:

Destination: Cascades (CPU or V100 GPU)

  • All data storage (/home, /work, /groups) is the same as on Newriver, so no data migration is needed
  • Recommended to adapt scripts to SLURM, but many simple PBS commands and scripts will work with little or no modification
  • V100 GPUs are deployed similarly to the P100s on Newriver

Destination: Dragonstooth (CPU-only, no IB, no GPU)

  • All data storage (/home, /work, /groups) is the same as on Newriver, so no data migration is needed
  • Recommended to adapt scripts to SLURM, but many simple PBS commands and scripts will work with little or no modification
  • Dragonstooth has no high-speed, low-latency Infiniband network interconnect

Destination: Tinkercliffs (Scalable CPU)

  • /home storage is the same as on Newriver
  • /groups/groupname storage has been cloned to /projects/groupname
  • Infer/Tinkercliffs have a 1TB quota on /work data and /work data from Newriver has not been transferred.
  • This cluster is a SLURM cluster and no backwards compatibility with PBS scripts or commands is provided.
  • Free tier usage is limited to 600,000 system units (normal_q core-hours) per month for each PI.

Destination: Infer (T4 GPU or P100 GPU)

  • /home storage is the same as on Newriver
  • /groups/groupname storage has been cloned to /projects/groupname
  • Infer/Tinkercliffs have a 1TB quota on /work data and /work data from Newriver has not been transferred.
  • This cluster is a SLURM cluster and no backwards compatibility with PBS scripts or commands is provided.
  • This cluster is currently equipped with 16 nodes which each have one NVIDIA T4 GPU and is where the P100 GPU nodes from Newriver will be reprovisioned

Migrating data off of Newriver

Filesystem                        Size  Used Avail Use% MountedOn
clproto-ha....:/gpfs/work         2.1P  1.3P  799T  62% /work
clproto...    :/gpfs/home        1002T  591T  411T  59% /groups
vt-archiv.... :/gpfs/archive/arc  420T  350T   71T  84% /vtarchive 
qumulo.arc.internal:/home         281T  177T  104T  63% /home

The output above shows the four notable user-facing filesystems available on Newriver.

Environment variables are set on the clusters as convenient shortcuts for storage paths, but may cause some confusion. In particular $WORK is set differently on each cluster. Use these tools to make sure full paths are what you expect:

  • env (prints all environment variables which are currently set)
  • pwd (print current working directory) to validate the full paths to your data.
# On a Newriver login node:
[mypid@nrlogin1 ~]$ env | grep work
WORK=/work/newriver/mypid

# On a Cascades login node:
[mypid@calogin1 ~]$ env | grep work
WORK=/work/cascades/mypid

# On the Infer login node:
[mypid@infer1 ~]$ env | grep work
WORK=/work/mypid

... to Cascades/Dragonstooth/Huckleberry

These four (/work, /groups, /vtarchive, and /home) are all available on the ARC clusters Cascades, Dragonstooth, and Huckleberry, and so the decommissioning of Newriver does not affect the general availability of that data.

... to Tinkercliffs/Infer

The newer ARC clusters, Tinkercliffs and Infer, use the same /home and /vtarchive filesystems as all the other clusters. But Newriver's /work and /groups filesystems are NOT available on these clusters. If you are migrating workloads from Newriver to these clusters and need data from /work or /groups, then that data will need to be transferred.

Generally speaking,

  • if /work/newriver/mypid data is needed on these clusters, then it would be transferred to /work/mypid on Tinkercliffs/Infer
  • if /groups/mygroup data is needed on these clusters, then it would be transferred to /projects/myproject on Tinkercliffs/Infer

Quotas on these systems are also different.

NewriverquotaTinkercliffs/Inferquota
Personal scratch space/work/newriver/mypid14TB/work/mypid1TB
Shared permanent storage/groups/mygroup10TB/projects/myproj25TB

Please be selective about the files you decide migrate and avoid keeping duplicate copies of data.

Data migration tips

Assess what data you need to move. Small data sets (less than 1000 files and less than 10GB total size) should be quick and easy to move, but larger data sets may take some planning (tar, compress, run in background). You usage information is presented at login when you connect to the system. For example:

data usage:
USER       FILESYS/SET                         DATA (GiB)   QUOTA (GiB) FILES      QUOTA      NOTE
mypid   /home                               467.7        640         -          -

You may want to move all the files you want to keep into directory for exporting. You can use du -sh /lustre/work/mypid/exportdir to compute the size of a particular directory and all its contents.

Reduce consumption by archiving and/or purging data which is no longer needed before initiating a transfer. This can make a migration easier and faster.

If you have a large dataset to move, it will be helpful to package it into one or more tarballs and to compress them. Where possible, separate datasets into chunks which compress to manageable sizes between 1GB and 1TB. Compression rates vary widely with the types of data. Image/video data is often already stored in a well compressed format and may not compress further, while text files often compress by 90% or more.

  • !! DO tar and DO compress data destined for /vtarchive

Example of creating a compressed tarball and some optional flags:

tar --create --xz --file=exportdir.tar.xz /lustre/work/mypid/exportdir

More options:
--bzip2 or --gzip may also be used for compression
--sparse 
   handle sparse files efficiently 
--remove-files
   remove files after adding them to the archive

If you're running processing several datasets or chunks at once, please perform the actions in the context of a job running on a compute node so that login node resources remain available for others.

Choose the destination(s). Make sure that the data you want to move will not exceed the space available (df -h) or any quota restrictions on the destination filesystem. Consider timing (how long might it take; is now a good time to start) and where to execute the transfer from (push vs. pull, login node vs. compute node) and then initiate it when you're ready.

Some typical destination options:

  1. Download to local/personal computer. Tools - interact, screen, scp (Linux/Mac), WinSCP, FileZilla (Windows), "&"
  2. Migrate to a different ARC filesystem. Keep quotas and data lifecycle in mind. Tools (command line) - interact, screen, run in background "&", tar, cp, mv, rsync
  3. Archive data to /vtarchive. Tools - interact, screen, "&", tar, cp, mv, rsync. Files sent to the /vtarchive tape system should be large (1GB-1TB) and already compressed for best results. Large quantities of small (<100MB) files are inappropriate for the tape library and can severely hamper its performance and reliability.

Notes regarding these tools:

  • interact - this command can be used to start a job on Newriver and provide you with an interactive shell on a compute node. This may be useful for running tar, compression, and even file transfers without risking overloading the login nodes. The default duration of an interact job is one hour, but this can be increased by passing some options as shown below. One caveat is that /vtarchive is not mounted on the compute nodes.
interact -l walltime=6:00:00 -A MyAlloc -q normal_q
  • rsync - this is a fully featured file transfer tool and has many options you can specify. We recommend the following:
rsync -vv -h -rltoDPSW -e "ssh -c aes128-ctr" newriver2.arc.vt.edu:/groups/yourgroup/dir1 /projects/myproject/destdir
  • scp (secure copy) - any filesystem on any host you can ssh to is a valid source/destination for scp. The standard format is scp sourcefile destination. When specifying a remote host, use a colon and then specify the full path of the file to copy. Example:
Personal-Mac:~ mypid$ scp newriver1.arc.vt.edu:/work/newriver/mypid/exportdir.tar.xz ./
exportdir.tar.xz                                                                                                                                                                        100%   21MB  30.1MB/s   00:00
Personal-Mac:~ mypid$
  • screen - this command can be used to essentially disconnect long-running tasks from the current shell so that you can log off from the cluster without forcing the task to quit.
    • Use "screen" to start a screen session.
    • Use "Ctrl+a,d" to disconnect from a running screen session (it will continue running on its own)
    • Use "screen -r" to resume a screen session
    • Use "exit" to terminate a running screen session.
#Start a screen session
[mypid@nrlogin2 ~]$ screen

#Start a task which will take a while to complete
[mypid@nrlogin2 ~]$ for ii in {1..10}; do echo $ii; sleep 10; done
1
2

# entered Ctrl+A,D to detach
[detached]

[mypid@nrlogin2 ~]$ 

#Check back in on the running screen session a while later
[mypid@nrlogin2 ~]$ screen -r

[mypid@brlogin2 ~]$ for ii in {1..10}; do echo $ii; sleep 10; done
...
4
5
6
#Still not done, so use Ctrl+A,D to detach again
[detached]
[mypid@nrlogin2 ~]$

#Log off from the system
[mypid@nrlogin2 ~]$ exit
logout
Connection to blueridge2.arc.vt.edu closed.

#Log back in a while later
Personal-Mac:~ mypid$ ssh newriver2.arc.vt.edu

#Resume the screen session which is still running
[mypid@nrlogin2 ~]$ screen -r

[mypid@nrlogin2 ~]$ for ii in {1..10}; do echo $ii; sleep 10; done
...
8
9
10
[mypid@nrlogin2 ~]$ 
#Long-running process has completed, so exit to terminate the screen session
[mypid@nrlogin2 ~]$ exit
exit

[screen is terminating]
[mypid@nrlogin2 ~]$