Cascades is a 236-node system capable of tackling the full spectrum of computational workloads, from problems requiring hundreds of compute cores to data-intensive problems requiring large amount of memory and storage resources. Cascade contains four compute engines designed for distinct workloads.

  • General - Distributed, scalable workloads. With Intel’s Broadwell processors, 2 16-core processors and 128 GB of memory on each node, this 190-node compute engine is suitable for traditional HPC jobs and large codes using MPI.
  • Very Large Memory -  Graph analytics and very large datasets. With 3TB (3072 gigabytes) of memory, four 18-core processors and 6 1.8TB direct attached SAS hard drives, 400 GB SAS SSD drive, and one 2 TB NVMe PCIe flash card , each of these two servers will enable analysis of large highly-connected datasets, in-memory database applications, and speedier solution of other large problems.
  • K80 GPU -  Data visualization and code acceleration. There are four nodes in this compute engine which have - two Nvidia K80 ("Kepler") GPUs, 512 GB of memory, and one 2 TB NVMe PCIe flash card.
  • V100 GPU - Extremely fast execution of GPU-enabled codes. There are 40 nodes in this engine, although one of these nodes is reserved for system maintenance. Each node is equipped with two Intel Skylake Xeon Gold 3 Ghz CPU's, amounting to 24 cores on each node. There is 384 GB of memory, and two NVIDIA V100 ("Volta") GPU's. Each of these GPU's is capable of more than 7.8 TeraFLOPS of double precision performance.

Technical Specifications

General 190 ca007-ca196 2 x E5-2683v4 2.1GHz (Broadwell) 32 128 GB, 2400 MHz 1.8TB 10K RPM SAS

200 GB SSD

Very Large Memory 2  ca001-ca002 4 x E7-8867v4 2.4 GHz (Broadwell) 72 3 TB, 2400 MHz 3.6 TB (2 x 1.8 TB) 10K RPM SAS (RAID 0)6-400 GB SSD (RAID 1)


K80 GPU 4  ca003-ca006 2 x E5-2683v4 2.1GHz (Broadwell) 32 512GB, 2400MHz 3.6 TB (2 x 1.8 TB) 10K RPM SAS (RAID 0)2-400 GB SSD (RAID 1)


V100 GPU 40  ca197-ca236 2 x Intel Xeon Gold 6136 3.0GHz (Skylake) 24 384GB, 2666MHz 2-400 GB SSD (RAID 1) 2-NVIDIA V100 GPU


  • K80 GPU Notes: There are 4 CUDA Devices. Although the K80s are a single physical device in 1 PCIe slot, there are 2 separate GPU chips inside. They will be shown as 4 separate devices to CUDA code. nvidia-smi will show this.
  • All nodes have locally mounted SAS and SSDs. /scratch-local (and $TMPDIR) point to the SAS drive and /scratch-ssd points to the SSD on each node. On large memory and GPU nodes, which have multiple of each drive, the storage across the SSDs are combined in /scratch-ssd (RAID 0) and the SAS drives are mirrored (RAID 1) for redundancy.


  • 100 Gbps Infiniband interconnect provides low latency communication between compute nodes for MPI traffic.
  • 10 Gbps Ethernet interconnect provides high speed connectivity and access to storage.


Cascades is governed by an allocation manager, meaning that in order to run most jobs, you must be an authorized user of an allocation that has been submitted and approved. For more on allocations, click here.

The Cascades partitions (queues) are:

  • normal_q for production (research) runs.
  • largemem_q for production (research) runs on the large memory nodes.
  • dev_q for short testing, debugging, and interactive sessions. dev_q provides slightly elevated job priority to facilitate code development and job testing prior to production runs.
  • k80_q for runs that require access to K80 GPU nodes
  • v100_normal_q for production (research) runs with the V100 nodes
  • v100_dev_q short testing, debugging, and interactive sessions with the V100 nodes

The Cascades partition (queue) settings are:

Access to ca007-ca196 ca001-ca002 ca007-ca196 ca003-ca006 ca197-ca236 ca197-ca236
Max Jobs 24 per user,
48 per allocation
1 per user 1 per user 4 per user, 6 per allocation 8 per user,
12 per allocation
1 per user
Max Nodes 32 per user,
48 per allocation
1 per user 32 per user,
48 per allocation
4 per user 12 per user,
24 per allocation
12 per user,
24 per allocation
Max Cores 1,024 per user,
1,536 per allocation
72 per user 1,024 per user, 1536 per allocation 128 per user 288 per user,
576 per allocation
336 per user
Max Memory (calculated, not enforced) 4 TB per user,
6 TB per allocation
3 TB per user 4 TB per user,
6 TB per allocation
2 TB per user 4 TB per user,
6 TB per allocation
1 TB per user
Max Walltime 144 hr 144 hr 2 hr 144 hr 144 hr 2 hr
Max Core-Hours 73,728 per user 10,368 per user 256 per user 9,216 per user 20,736 per user 168 per user


  • Shared node access: more than one job can run on a node (Note: This is different from other ARC systems)
  • The micro-architecture on the V100 nodes is newer than (and distinct from) the Broadwell nodes. For best performance and compatibility, programs that are to run on V100 nodes should be compiled on a V100 node. Note that the login nodes are Broadwell nodes, so compilation on a V100 node should be done as part of the batch job, or during an interactive job on a V100 node (see below).


For list of software available on Cascades, as well as a comparison of software available on all ARC systems, click here.

Note that a user will have to load the appropriate module(s) in order to use a given software package on the cluster. The module avail and module spider commands can also be used to find software packages available on a given system.


Cascades is accessed through traditional terminal means.

Terminal Access

The cluster is accessed via ssh to one of the two login nodes below. Log in using your username (usually Virginia Tech PID) and password. You will need an SSH Client to log in; see here for information on how to obtain and use an SSH Client.


Job Submission

Access to all compute nodes is controlled via the job scheduler. See the Job Submission page here. The basic flags are:

#SBATCH -t dd-hh:mm:ss
#SBATCH [resource request, see Requesting Resources]
#SBATCH -p normal_q (or other partition, see Policies)
#SBATCH -A <yourAllocation> (see Policies)

The Cascades cluster formerly used a different scheduler which would take #PBS style directives. Configurations were implemented during the transition to Slurm so that most of these directives and commands will continue to work without any modifications. In particular, the following PBS environment variables are populated with values as needed to allow jobs which depend on them to work:

PBS_O_WORKDIR=<job submission directory>
PBS_JOBID=<job number>
PBS_NP=<#cpu-cores allocated to job>
PBS_NODEFILE=<file containing list of the job's nodes>

Requesting resources

Cascades compute nodes can be shared by multiple jobs. Resources can be requested by specifying the number of nodes, processes per node (ppn), cores, memory, etc. See example resource requests below:

#Request exclusive access to all resources on 2 nodes 
#SBATCH --nodes=2 
#SBATCH --exclusive

#Request 4 cores (on any number of nodes)
#SBATCH --ntasks=4

#Request 2 nodes with 12 tasks running on each
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

#Request 12 tasks with 20GB memory per core
#SBATCH --ntasks=12 
#SBATCH --mem-per-cpu=20G

#Request 5 nodes and spread 50 tasks evenly across them
#SBATCH --nodes=5
#SBATCH --ntasks=50
#SBATCH --spread-job

#Request one NVIDIA V100 GPU and 100GB memory
#SBATCH --nodes=1 #(implies --ntasks=1 unless otherwise specified)
#SBATCH --partition=v100_normal_q
#SBATCH --gres=gpu:1
#SBATCH --mem=100G

Finding Information

Check status of a job after submission:


Get detailed information about a running  job

scontrol show job <job-number>

Check status of the cluster's nodes and partitions:


Interactive access

You can submit a request for interactive access to a node. Such a request will be handled by the job scheduler, so there may be a wait before you gain access. You can request access to a Broadwell or Skylake node.

A typical command requesting access to a general compute node would be:

interact --partition=dev_q --nodes=1 --ntasks-per-node=6  -A  yourallocationname

interact --partition=dev_q --nodes=1 --exclusive  -A  yourallocationname

A typical command requesting access to a V100 GPU node would be:

interact --partition=v100_dev_q --nodes=1 ----ntasks-per-node=12 --gres=gpu:1 -A yourallocationname

interact --partition=v100_dev_q --nodes=1 --exclusive --gres=gpu:2  -A  yourallocationname

Once you get access, you can issue commands just as you would on a login node. When you are done with your work, issue a logout command, which will return you to your starting point on a Cascades login node.


This shell script provides a template for submission of jobs on Cascades. The comments in the script include notes about how to request resources, load modules, submit MPI jobs, etc.

To utilize this script template, create your own copy and edit as described here.

Deep Learning on Cascades(Updated March 2019)

The fact is  ARC users could install their own Anaconda/Miniconda,
and then install Tensorflow or Pytorch after creating the virtual environment in the login node, an example script is shown below:
###### Step 1. Check anaconda    
module load Anaconda/5.2.0
which conda
###### Step 2. Install tensorflow  
# install in login node(take TF 1.12 for example)
conda create -n pytf_cc python=3.6
source activate pytf_cc
conda install tensorflow-gpu
######  Step 3.  Test tensorflow installation
# In interactive session on GPU nodes
source activate pytf_cc
module load gcc cmake
module load cuda/9.0.176
module load cudnn/7.1
python -c 'import tensorflow as tf ; print(tf. __version__)'