LS-DYNA
Introduction
LS-DYNA is a general-purpose finite element program capable of simulating complex real world problems. It is used by the automobile, aerospace, construction, military, manufacturing, and bioengineering industries. LS-DYNA is optimized for shared and distributed memory Unix, Linux, and Windows based, platforms, and it is fully QA’d by LSTC. The code’s origins lie in highly nonlinear, transient dynamic finite element analysis using explicit time integration.
Availability
LS-DYNA is available on several ARC systems. Virginia Tech maintains a limited quantity of LS-DYNA network licenses through the university’s IT Procurement and Licensing Solutions which can be used for the SMP, MPP, and Hybrid versions of LS-DYNA. LSTC also develops its own preprocessor, LS-PrePost, which is freely distributed and runs without a license.
License availability
Recent installations of LS-DYNA on ARC systems make available LSTC’s license tools which can be used to query the server for licenses which have been checked out, how many are currently available, and kill and “zombified” license checkouts (as happens if LS-DYNA terminates in an unexpected manner).
For the following commands to work, you must have loaded an LS-DYNA module which provides these programs. If it does not provide them, you will get an error like lstc_qrun: no such file or directory
Check Number of Licenses Available
Load the LS-DYNA module (eg.
module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
for v. 10.2 on Tinkercliffs)Set and export the
LSTC_LICENSE_SERVER
evironment variable to the name of the license server you want to check (eg.ansys.software.vt.edu
for the main Virginia Tech LS-DYNA license server).Run the command
lstc_qrun -L LS-DYNA
to query SMP licenses orlstc_qrun -L MPPDYNA
to query MPP licenses.
For example:
$ module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
$ export LSTC_LICENSE_SERVER=ansys.software.vt.edu
$ lstc_qrun -L MPPDYNA
Defaulting to server 1 specified by LSTC_LICENSE_SERVER variable
500 LICENSE(S) AVAILABLE for PROG=MPPDYNA USER=brownm12 HOST=tinkercliffs2 IP=198.82.249.14
$ lstc_qrun -L LS-DYNA
Defaulting to server 1 specified by LSTC_LICENSE_SERVER variable
500 LICENSE(S) AVAILABLE for PROG=LS-DYNA USER=brownm12 HOST=tinkercliffs2 IP=198.82.249.14
Query Licenses Currently Checked Out From License Server
Load the LS-DYNA module (eg.
module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
for v. 10.2 on Tinkercliffs)Set and export the
LSTC_LICENSE_SERVER
evironment variable to the name of the license server you want to check (eg.ansys.software.vt.edu
for the main Virginia Tech LS-DYNA license server).Run the command
lstc_qrun
$ module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
$ export LSTC_LICENSE_SERVER=ansys.software.vt.edu
$ lstc_qrun
Defaulting to server 1 specified by LSTC_LICENSE_SERVER variable
Running Programs
User Host Program Started # procs
-----------------------------------------------------------------------------
brownm12 205377@tc154.cm.cluster MPPDYNA Wed Oct 20 10:00 16
No programs queued
Kill a zombified LS-DYNA license
Load the LS-DYNA module (eg.
module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
for v. 10.2 on Tinkercliffs)Set and export the
LSTC_LICENSE_SERVER
evironment variable to the name of the license server you want to use (eg.ansys.software.vt.edu
for the main Virginia Tech LS-DYNA license server).Run the command
lstc_qrun
(see above) to and note the “Host” column entry for the program to kill.Run the command
lstc_qkill <program to kill>
$ module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
$ export LSTC_LICENSE_SERVER=ansys.software.vt.edu
$ lstc_qkill 205377@tc154.cm.cluster
Interface
There are two types of environments in which the LSTC applications can be used on ARC resources:
Graphical interface for LS-PrePost via OnDemand
Command-line interface. You can also start LS-DYNA from the command line on Unix systems where MATLAB is installed. Note that the command line runs on the login node, so big computations should be submitted as jobs via a traditional job submission.
Parallel Computing with LS-DYNA
There are three primary modes of obtaining parallelism in LS-DYNA. All of these are also built to take advantage of microarchitecture vectorization instructions like AVX2 and AVX512 and ARC attempts to provide LS-DYNA executables optimized for local the microarchitecture of the system.
SMP: Shared Memory Parallel. Execution is limited to a single node since the threads require shared access to the same memory space.
MPP: Message Passing Parallel. Several or many processes are launched and run as if each is on its own computer with dedicated memory. The discretization of the domain is divided equally (more or less) between the processes (ie. “domain decomposition”) and each process is carries out the simulation on its subdomain. Neighboring subdomains affect each other, so processes must pass messages (MPI) to share the necessary data. This mode can scale to a large number of processors across many machines, but the overhead of subdividing the domain and passing messages becomes significant.
Hybrid: MPP combined with SMP.
As of October 2021, Virginia Tech’s central license pool is for 500
concurrent cores which can be allocated among all running programs.
Job Submission
Hybrid
To use the LS-DYNA hybrid mode of parallelism, you need to consider how many MPI processes (aka tasks/ranks) you want and how much SMP (shared memory parallelism) to provide to each MPI process. This combination is also constrained by the total number of licenses available when your job starts. So ntasks * cpus-per-task
must be a licensable number.
Some scaling tests with example code on Tinkercliffs suggest that the time-to-completion in Hybrid mode does not improve beyond 16 MPP procs and that when the number of MPP procs is scaled beyond 32, it will increase instead of decrease. So we suggest $SBATCH --ntasks=16
or smaller.
Similar tests show that when the number of SMP threads exceeds 8, the time-to-completion shows high variability and diminished returns, so we suggest $SBATCH --cpus-per-task=8
with 4 and 16 possibly providing comparable performance.
The --cpus-per-task
and --ntasks
options work together to inform Slurm how many cores to allocate for the job and also how to lauch the processes when the srun
launcher is used. But LS-DYNA also needs to be directed how many threads to use and this is accomplished by providing the ncpu=-##
option to the LS-DYNA hybrid program.
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=8
module reset
module load tinkercliffs-rome/ls-dyna/10.2.0-intel-2019b
export LSTC_LICENSE_SERVER=ansys.software.vt.edu
srun --mpi=pmi2 ls-dyna_hyb_d_R10_2_0_x64_centos65_ifort160_avx2_intelmpi-2018 i=shock02.k ncpu=-$SLURM_CPUS_PER_TASK
Example Scaling Results for Hybrid:
shock02_nt-8_cpt-2: Elapsed time 22 seconds for 47494 cycles using 8 MPP procs and 2 SMP threads
shock02_nt-4_cpt-2: Elapsed time 23 seconds for 47494 cycles using 4 MPP procs and 2 SMP threads
shock02_nt-8_cpt-4: Elapsed time 23 seconds for 47494 cycles using 8 MPP procs and 4 SMP threads
shock02_nt-4_cpt-4: Elapsed time 24 seconds for 47494 cycles using 4 MPP procs and 4 SMP threads
shock02_nt-4_cpt-64: Elapsed time 24 seconds for 7264 cycles using 4 MPP procs and 64 SMP threads
shock02_nt-8_cpt-4: Elapsed time 24 seconds for 47494 cycles using 8 MPP procs and 4 SMP threads
shock02_nt-4_cpt-1: Elapsed time 25 seconds for 47494 cycles using 4 MPP procs and 1 SMP thread
shock02_nt-4_cpt-4: Elapsed time 25 seconds for 47494 cycles using 4 MPP procs and 4 SMP threads
shock02_nt-4_cpt-4: Elapsed time 25 seconds for 47494 cycles using 4 MPP procs and 4 SMP threads
shock02_nt-4_cpt-8: Elapsed time 25 seconds for 47494 cycles using 4 MPP procs and 8 SMP threads
shock02_nt-16_cpt-2: Elapsed time 26 seconds for 47494 cycles using 16 MPP procs and 2 SMP threads
shock02_nt-8_cpt-4: Elapsed time 26 seconds for 47494 cycles using 8 MPP procs and 4 SMP threads
shock02_nt-2_cpt-8: Elapsed time 27 seconds for 47494 cycles using 2 MPP procs and 8 SMP threads
shock02_nt-4_cpt-8: Elapsed time 27 seconds for 47494 cycles using 4 MPP procs and 8 SMP threads
shock02_nt-8_cpt-1: Elapsed time 27 seconds for 47494 cycles using 8 MPP procs and 1 SMP thread
shock02_nt-16_cpt-2: Elapsed time 28 seconds for 47494 cycles using 16 MPP procs and 2 SMP threads
shock02_nt-2_cpt-1: Elapsed time 28 seconds for 47494 cycles using 2 MPP procs and 1 SMP thread
shock02_nt-2_cpt-4: Elapsed time 28 seconds for 47494 cycles using 2 MPP procs and 4 SMP threads
shock02_nt-8_cpt-16: Elapsed time 28 seconds for 47494 cycles using 8 MPP procs and 16 SMP threads
shock02_nt-8_cpt-2: Elapsed time 28 seconds for 47494 cycles using 8 MPP procs and 2 SMP threads
shock02_nt-16_cpt-1: Elapsed time 29 seconds for 47494 cycles using 16 MPP procs and 1 SMP thread
shock02_nt-2_cpt-8: Elapsed time 29 seconds for 47494 cycles using 2 MPP procs and 8 SMP threads
shock02_nt-1_cpt-4: Elapsed time 30 seconds for 47494 cycles using 1 MPP proc and 4 SMP threads
shock02_nt-2_cpt-2: Elapsed time 30 seconds for 47494 cycles using 2 MPP procs and 2 SMP threads
shock02_nt-16_cpt-2: Elapsed time 31 seconds for 47494 cycles using 16 MPP procs and 2 SMP threads
shock02_nt-32_cpt-1: Elapsed time 31 seconds for 47494 cycles using 32 MPP procs and 1 SMP thread
shock02_nt-32_cpt-1: Elapsed time 31 seconds for 47494 cycles using 32 MPP procs and 1 SMP thread
shock02_nt-16_cpt-4: Elapsed time 32 seconds for 47494 cycles using 16 MPP procs and 4 SMP threads
shock02_nt-32_cpt-2: Elapsed time 32 seconds for 47494 cycles using 32 MPP procs and 2 SMP threads
shock02_nt-16_cpt-4: Elapsed time 33 seconds for 47494 cycles using 16 MPP procs and 4 SMP threads
shock02_nt-2_cpt-16: Elapsed time 33 seconds for 47494 cycles using 2 MPP procs and 16 SMP threads
shock02_nt-2_cpt-2: Elapsed time 33 seconds for 47494 cycles using 2 MPP procs and 2 SMP threads
shock02_nt-2_cpt-8: Elapsed time 33 seconds for 47494 cycles using 2 MPP procs and 8 SMP threads
shock02_nt-32_cpt-2: Elapsed time 33 seconds for 47494 cycles using 32 MPP procs and 2 SMP threads
shock02_nt-8_cpt-1: Elapsed time 33 seconds for 47494 cycles using 8 MPP procs and 1 SMP thread
shock02_nt-8_cpt-2: Elapsed time 33 seconds for 47494 cycles using 8 MPP procs and 2 SMP threads