lammps


Introduction:

LAMMPS is a parallel, classical molecular dynamics code that models an
ensemble of particles in a liquid, solid, or gaseous state. It can model
atomic, polymeric, biological, metallic, granular, and coarse-grained
systems using a variety of force fields and boundary conditions.

LAMMPS runs efficiently on single-processor desktop or laptop machines, but
is designed for parallel computers; its performance will scale well over a
large number of processors. It will run on any parallel machine
that compiles C++ and supports the MPI message-passing library. This
includes distributed- or shared-memory parallel machines and Beowulf-style
clusters.

LAMMPS can model systems whose size ranges fro only a few particles
up to millions or billions.

LAMMPS is a freely-available open-source code, distributed under the terms
of the GNU Public License.

LAMMPS was originally developed under a US Department of Energy CRADA
(Cooperative Research and Development Agreement) between two DOE labs and
3 companies. It is distributed by Sandia National Labs.

Web site:


http://lammps.sandia.gov

Reference:

To get full documentation, visit

http://lammps.sandia.gov/doc/Manual.html
.

Usage:

On any ARC cluster, check the installation details
by typing "module spider lammps".

The ARC clusters offer versions of LAMMPS for parallelism under
MPI, GPU, and USER-CUDA, installed as lmp_mpi,
lmp_gpu and lmp_cuda.

GPU Acceleration

Both GPU and USER-CUDA packages accelerate a LAMMPS calculation using the
NVIDIA GPU cards, but they do it in different ways. As a consequence,
for a particular simulation on GPU, one package may be faster than the
other. We give guidelines below, but the best way to determine which
package is faster for your input script is to try both of them.

Differences between the GPU and USER-CUDA packages include:

  • The GPU package accelerates only pair force, neighbor list, and
    PPPM calculations. The USER-CUDA package currently supports a wider
    range of pair styles and can also accelerate many fix styles and
    some compute styles, as well as neighbor list and PPPM calculations.
  • The USER-CUDA package does not support acceleration for minimization.
  • The USER-CUDA package does not support hybrid pair styles.
  • The USER-CUDA package can order atoms in the neighbor list differently
    from run to run resulting in a different order for force accumulation.
  • The USER-CUDA package has a limit on the number of atom types that can
    be used in a simulation.
  • The GPU package requires neighbor lists to be built on the CPU when
    using exclusion lists or a triclinic simulation box.
  • The GPU package uses more GPU memory than the USER-CUDA package.
    This is generally not a problem since typical runs are
    computation-limited rather than memory-limited.

Usage:

Before running LAMMPS, the appropriate modules must be loaded. The
particular list of modules depends on the version of LAMMPS being used.
To check this, one can type the command

      module spider lammps
    

to see the list of versions of LAMMPS available on the particular cluster,
and then

      module spider lammps/10Aug15
    

to see the list of modules that must be loaded for that version.

Input File:

LAMMPS executes by reading commands from a input script in text file format,
one line at a time. When the input script ends, LAMMPS exits. Each
command causes LAMMPS to take some action. It may set an internal variable,
read in a file, or run a simulation. Most commands have default settings,
which means you only need to use the command if you wish to change the
default.

For a detailed explanation of the commands to LAMMPS, see
here.

Examples:

We will consider a single example which we can run on the various versions
of LAMMPS. It is recommended that you test this script before you try to
run your own LAMMPS problems.

The example is a Lennard-Jones melt in a 3D box. The Lennard-Jones
force has a cutoff at r = 2.5 sigma, where sigma is the distance at
which the interparticle potential is zero. The system includes
32,000 atoms, and is to be modeled for 100 time steps.

The example is stored in the file in.lj, and reads as follows:

# 3d Lennard-Jones melt
variable x index 1
variable y index 1
variable z index 1
variable   t index 20000

variable xx equal 20*$x
variable yy equal 20*$y
variable zz equal 20*$z

units         lj
atom_style    atomic

lattice       fcc 0.8442
region        box block 0 ${xx} 0 ${yy} 0 ${zz}
create_box    1 box
create_atoms  1 box
mass     1 1.0

velocity all create 1.44 87287 loop geom

pair_style    lj/cut 2.5
pair_coeff    1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify  delay 0 every 20 check no

fix      1 all nve
thermo         100
run      $t

Running LAMMPS with MPI:

Here is a sample bash script to run the example with lmp_mpi,
using 2 nodes and 48 cores on newriver:

#!/bin/bash
#PBS -l nodes=2:ppn=24
#PBS -l walltime=00:05:00
#PBS -q open_q
#PBS -W group_list=newriver
#PBS -j oe

cd $PBS_O_WORKDIR

module purge
module load gcc/4.7.2 
module load openmpi/1.6.4 
module load mkl/11.2.3 
module load cuda/7.0.28 
module load fftw/3.3.4 
module load lammps/10Aug15

mpirun -np 48 lmp_mpi < in.lj

Running LAMMPS with CUDA:

Note that when using the USER-CUDA package, you must use exactly one MPI
task per physical GPU. Here is a sample bash script to run the example
with lmp_cuda on newriver three times, using different
configurations of GPUs:

  1. on 1 GPU on 1 node;
  2. on 2 GPUs on 1 node;
  3. on 4 GPUs on 2 nodes.

#!/bin/bash
#PBS -l nodes=2:ppn=2:gpus=2
#PBS -l walltime=00:05:00
#PBS -q open_q
#PBS -W group_list=newriver
#PBS -j oe

cd $PBS_O_WORKDIR

module purge
module load gcc/4.7.2 
module load openmpi/1.6.4 
module load mkl/11.2.3 
module load cuda/7.0.28 
module load fftw/3.3.4 
module load lammps/10Aug15

export CUDA_VISIBLE_DEVICES=1,0

lmp_cuda -c on -sf cuda < in.lj
#
#  1 node, 1 MPI task, 1 GPU = 1 MPI task/GPU:
#
mpirun -np 1 lmp_cuda -c on -sf cuda < in.lj
#
#  1 node, 2 MPI tasks, 2 GPUs = 1 MPI task/GPU:
#
mpirun -np 2 lmp_cuda -c on -sf cuda -pk cuda 2 < in.lj
#
#  2 nodes, 4 MPI tasks, 2x2 GPUs = 1 MPI task/GPU:
#
mpirun -np 4 lmp_cuda -c on -sf cuda -pk cuda 2 < in.lj

Running LAMMPS with GPUs:

For running the GPU version of LAMMPS on multiple nodes on newriver,
your submission script needs to reset CUDA_VISIBLE_DEVICES correctly.
For example:

      export CUDA_VISIBLE_DEVICES=1,0

Here is a sample bash script to run lmp_gpu on the example
three times:

  1. on 1 GPU on 1 node;
  2. on 2 GPUs on 1 node;
  3. on 4 GPUs on 2 nodes.
#!/bin/bash
#PBS -l nodes=2:ppn=2:gpus=2
#PBS -l walltime=02:05:00
#PBS -q open_q
#PBS -W group_list=newriver
#PBS -j oe

cd $PBS_O_WORKDIR

module purge
module load gcc/4.7.2 
module load openmpi/1.6.4 
module load mkl/11.2.3 
module load cuda/7.0.28 
module load fftw/3.3.4 
module load lammps/10Aug15

export CUDA_VISIBLE_DEVICES=1,0

lmp_gpu -sf gpu < in.lj
#
#  1 node, 8 MPI tasks, 1 GPU = 8 MPI tasks / GPU.
#
mpirun -np 8 lmp_gpu -sf gpu < in.lj
#
#  1 node, 12 MPI tasks, 2 GPUs = 6 MPI tasks / GPU.
#
mpirun -np 12 lmp_gpu -sf gpu -pk gpu 2 < in.lj
#
#  2 nodes, 4 MPI tasks, 2x2 GPUs = 1 MPI task / GPU.
#
mpirun -np 4 lmp_gpu -sf gpu -pk gpu 2 < in.lj

A complete set of files to carry out a LAMMPS calculation are available in
lammps_example.tar