LAMMPS

Introduction

Large-scale Atomic/Molecular Massively Parallel Simulator LAMMPS is an open-source molecular dynamics software package developed at Sandia National Laboratories. It is widely used for classical molecular dynamics simulations in materials science, chemistry, soft matter, and biological systems. LAMMPS is designed for scalable high-performance computing and supports parallel execution using MPI, threading, and accelerator technologies such as KOKKOS and GPUs.

KOKKOS Support in LAMMPS

KOKKOS is an open-source performance portability framework that enables applications to run efficiently on a wide range of hardware architectures, including multi-core CPUs and GPUs. LAMMPS uses KOKKOS as an abstraction layer to support different execution backends (e.g., OpenMP on CPUs or CUDA on NVIDIA GPUs) from a single code base. On ARC systems, all provided LAMMPS builds are KOKKOS-enabled. GPU-enabled LAMMPS modules use KOKKOS with the CUDA backend to offload computation to GPUs. KOKKOS and GPU acceleration are therefore complementary, not mutually exclusive.

OPENMP Support in LAMMPS

LAMMPS also provides an OPENMP package that enables shared-memory parallelism using OpenMP threads within each MPI rank. This approach can reduce MPI communication overhead and is often effective on CPU-only nodes with many cores per socket. Unlike KOKKOS, which provides performance portability across CPUs and GPUs, the OPENMP package is intended only for CPU-based execution. On ARC systems, the OPENMP package can be used with LAMMPS builds that include OpenMP support and is most effective when combined with Slurm’s --cpus-per-task option.

LAMMPS Versions and Availability on ARC

ARC provides multiple LAMMPS builds optimized for different compiler toolchains and hardware architectures. All current ARC builds include KOKKOS support, with selected versions additionally enabled for GPU acceleration and machine-learning interatomic potentials.

LAMMPS is free and open-source software and does not require a license. It is available to all ARC users on supported clusters, and no special group membership is required. For official documentation, tutorials, and source code, see: LAMMPS

Available LAMMPS Modules

CPU-only builds:

LAMMPS/22Jul2025-foss-2024a-kokkos
LAMMPS/29Aug2024_update2-foss-2024a-kokkos
LAMMPS/29Aug2024-foss-2023b-kokkos

GPU-enabled builds:

LAMMPS/22Jul2025-foss-2024a-kokkos-CUDA-12.6.0
LAMMPS/29Aug2024-foss-2023b-kokkos-CUDA-12.6.0
LAMMPS/29Aug2024_update2-foss-2024a-kokkos-CUDA-12.6.0

Specialized builds:

LAMMPS/28Oct2024-foss-2023a-kokkos-mace-CUDA-12.1.1

The kokkos-mace build includes support for the MACE machine-learning interatomic potential and is intended for GPU-accelerated workflows using ML-based force fields.

You can view detailed build options and enabled packages for any module with:

module show LAMMPS/<version>

Choosing Clusters and Partitions

The optimal ARC cluster and partition for a LAMMPS simulation depends on system size, force field choice, and whether GPU acceleration is used.

General guidance:

CPU-only LAMMPS runs

  • Best suited for CPU-focused partitions on ARC clusters

  • Appropriate for smaller systems, testing, and force fields that do not benefit from GPU acceleration

GPU-accelerated LAMMPS runs

  • Best suited for GPU partitions on clusters such as Falcon and Tinkercliffs

  • Recommended for large systems, long simulations, and compute-intensive pair styles

Machine-learning potentials MACE

  • Require GPU-enabled builds

  • Should be run on GPU partitions using the LAMMPS/28Oct2024-foss-2023a-kokkos-mace-CUDA-12.1.1 module

Users are encouraged to perform small scaling tests to determine the most efficient configuration for their specific workload.

Interface

LAMMPS is primarily used via the command line. The main executable is typically named lmp, and is launched with an input script provided via the -in option. Example usage on an ARC cluster:

[mypid@tinkercliffs1 ~]$ module load LAMMPS/22Jul2025-foss-2024a-kokkos
[mypid@tinkercliffs1 ~]$ lmp -help

LAMMPS simulations are usually run in batch mode using the Slurm scheduler. A simple MPI-based run might look like:

mpirun -np $SLURM_NTASKS lmp -in input.in > output.txt

When using mpirun, users are responsible for ensuring that the number of MPI ranks matches the resources requested from Slurm. For GPU-enabled or KOKKOS-accelerated builds, additional runtime flags may be required to select the appropriate execution backend (e.g., CUDA). Users should consult the LAMMPS documentation for package-specific and accelerator-specific options.

Controlling KOKKOS Execution at Runtime

LAMMPS behavior can be controlled at runtime using command-line flags that enable or configure accelerator packages. Common options when using KOKKOS-enabled builds include:

-sf kk

Enables the KOKKOS package for supported LAMMPS styles.

-k on

Activates KOKKOS acceleration.

-k on g <Ngpu>

Enables KOKKOS and specifies the number of GPUs per node.

-pk kokkos neigh full

Example of passing package-specific options to KOKKOS.

A simple example using KOKKOS on CPUs:

mpirun -np $SLURM_NTASKS lmp -sf kk -k on t $SLURM_CPUS_PER_TASK -in input.in 

And on GPU-enabled nodes:

mpirun -np $SLURM_NTASKS lmp -sf kk -k on g 1 -in input.in

Using the OPENMP Package at Runtime

The OPENMP package is enabled at runtime using the -sf omp flag, along with the -pk omp option to specify the number of OpenMP threads per MPI rank. Key options include:

-sf omp

Enables the OPENMP-accelerated styles.

-pk omp <$SLURM_CPUS_PER_TASK>

Sets the number of OpenMP threads per MPI rank.

The number of OpenMP threads should generally match the value of $SLURM_CPUS_PER_TASK.

Choosing Between KOKKOS and OPENMP

General guidance for ARC systems:

Use KOKKOS:

  • When running on GPU-enabled nodes

  • When targeting performance portability across CPUs and GPUs

  • For large-scale production runs

Use OPENMP:

  • On CPU-only nodes with many cores per node

  • When MPI scaling is limited or communication-heavy

  • For moderate system sizes where shared-memory threading is effective

Users are encouraged to benchmark both approaches for their specific workloads.

Example: MPI + OpenMP Hybrid Execution

On CPU-only nodes, LAMMPS can be run using a hybrid MPI + OpenMP configuration. For example, within a Slurm allocation:

mpirun -np $SLURM_NTASKS lmp -sf omp -pk omp $SLURM_CPUS_PER_TASK -in input.in > output.txt

In this configuration, $SLURM_NTASKS controls the number of MPI ranks and $SLURM_CPUS_PER_TASK controls the number of OpenMP threads per rank. Users should ensure that the product of MPI ranks and OpenMP threads does not exceed the total number of allocated CPU cores.

Scaling Considerations

LAMMPS performance depends strongly on how CPU cores and GPUs are allocated.

General scaling guidelines:

CPU scaling

  • LAMMPS typically scales well to tens or hundreds of MPI ranks, depending on system size

  • Oversubscribing cores can reduce performance; users should match MPI ranks to available cores

GPU scaling

  • Many LAMMPS workloads perform best with one GPU per node

  • Adding more MPI ranks per GPU does not always improve performance and may degrade it

  • Hybrid MPI + threading or KOKKOS configurations are often more efficient than pure MPI

Because performance is highly problem-dependent, benchmarking is recommended when scaling to larger node counts.

Examples and Tutorials

ARC maintains a collection of example LAMMPS input scripts and job submission templates in a public GitHub repository. These examples demonstrate common workflows, including MPI-based runs, KOKKOS acceleration, and GPU-enabled simulations on ARC clusters. Users are encouraged to review and adapt these examples when developing their own LAMMPS simulations. The ARC LAMMPS examples repository is available at: https://github.com/AdvancedResearchComputing/examples/tree/master/lammps