slurm


Introduction:

SLURM is a job scheduler that manages resources on a computer cluster.
It accepts job scripts from users and places them in a queue (which
SLURM calls a "partition"); it determines
an appropriate time when a user job can be run with the necessary memory,
processors, and other requested resources; it returns to the user the
output from a completed job.

The user defines a job request by writing a SLURM script. This script
begins with a sequence of lines using the prefix #SBATCH which indicate
the requested resources, and other information that configures the job.
The script then lists a sequence of commands to be executed, such as
compiling a program, running an executable program, moving files, and
other commands that, theoretically, could have been issued by the
user in an interactive session.

The sbatch command is used to submit a job to SLURM. Assuming the job
script is called "myjob.sh", the user, working from a login node of the
computer cluster, would issue the command:

    sbatch myjob.sh

The submitted job is accepted by SLURM for execution at some later time.
A concerned user can check on the status of all job

    squeue -u USERNAME

or a particular job:

    sacct -j JOBID

or cancel a job:

    scancel JOBID

or monitor the status of all available queues:

    sinfo

SLURM defines a number of environment variables to simplify work;
the most commonly used one is $SLURM_SUBMIT_DIR, which identifies
the directory from which the job script was submitted. This allows
a user to indicate that the batch job should move to this directory
at execution time, typically because this is the place where input
files and other data may be conveniently found.

Web Site:

The SLURM home page:

https://slurm.schedmd.com/

Reference:

Usage:

Currently, SLURM is available only on the ARC Huckleberry cluster.
The other systems use PBS.

Examples:

The following batch file illustrates a simple set of SLURM header lines
appropriate to run a job:

#! /bin/bash
#
#SBATCH -J slurm_huckleberry
#SBATCH -p normal_q
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 00:05:00
#SBATCH --mem=100M
#
cd $SLURM_SUBMIT_DIR
#
#  module load commands, such as "module load gcc", go here.
#
echo "Commands to be executed by your SLURM job go here."
#
echo ""
echo "SLURM_HUCKLEBERRY: Normal end of execution."
exit 0

A complete set of files to carry out a similar process are available in
slurm.tar