Biomed - Biomedical research

Overview

The biomed cluster has 7 nodes, 448 CPU cores, 5 TB RAM, and 8 NVIDIA A100 GPUs. Biomed hardware is summarized in the table below.

Node Type	CPU	GPU	Total
Chip	AMD EPYC 7542 Milan	AMD EPYC 7542 Milan	-
Architecture	Zen 2	Zen 2	-
Slurm features	-	-	-
Nodes	6	1	7
GPUs	-	8x NVIDIA A100-80G	16
Cores/Node	64	64	-
Memory (GB)/Node	512	2,048	-
Maximum Memory for Slum (GB)/Node	495	2,007	-
Total Cores	384	64	448
Total Memory (GB)	3,072	2,048	5120
Local Disk	240GB SSD	240GB SSD	-
Interconnect	HDR-100 IB	HDR-100 IB	-

Access

The biomed cluster is set up to host projects which require some computational scale but are subject to restrictions such as NIST SP 800-171 required by NIH. Access to the biomed cluster requires approval from the Oficce of Research’s Division of Scholarly Integrity and Research Compliance and consultation with ARC personnel to set up access and provide instructions for use.

Get Started

Biomed can be accessed via the login node using your VT credentials:

biomed1.arc.vt.edu

For testing purposes, all users will be alloted an initial 240 core-hours for 90 days in the “personal” allocation. Researchers at the PI level are able to request resource allocations in the “free” tier (usage fully subsidized by VT) and can allocate 2,000,000 monthly Service Units among their projects.

To create an allocation, log in to the ARC allocation portal https://coldfront.arc.vt.edu

select or create a project
click the “+ Request Resource Allocation” button
Choose the “Compute (Free) (Cluster)” allocation type

Usage needs in excess of 2,000,000 monthly Service Units can be purchased via the ARC Cost Center.

Partitions

Users submit jobs to partitions of the cluster depending on the type of resources needed (for example, CPUs or GPUs). Features are optional restrictions users can indicate in their job submission to restrict the execution of their job to nodes meeting specific requirements. If users do not specify the amount of memory requested for a job, the parameter DefMemPerCPU will automatically determine the amount of memory for the job based on the number of CPU cores requested. If the users do not specify the number of CPU cores on a GPU job, the parameter DepCpuPerGPU will automatically determine the number of CPU cores based on the number of GPUs requested. Jobs will be billed against the user’s allocation accounting for the utilization of number of CPU cores, memory, and GPU time. Consult the Slurm configuration to understand how to specify the parameters for your job.

Partition	normal_q	a100_normal_q
Node Type	CPU	GPU
Features	-	-
Number of Nodes	6	1
DefMemPerCPU (MB)	7920	32112
DefCpuPerGPU	-	4
TRESBillingWeights	CPU=1.0,Mem=0.0625G	CPU=1.0,Mem=0.0625G,GRES/gpu=100.0
PreemptMode	OFF	OFF

Quality of Service (QoS)

ARC must balance the needs of individuals with the needs of all to ensure fairness. This is done by providing options which determine the Quality of Service (QoS).

The QoS associated with a job affects the job in three key ways: scheduling priority, resource limits, and time limits. Each partition has a default QoS named partitionname_base with a default priority, resource limits, and time limits. Users can optionally select a different QoS to increase or decrease the priority, resource limits, and time limits. The goal is to offer users multiple flexible options that adjust to their jobs needs. The long QoS allows users to run for an extended period of time (up to 14 days) but reduces the total amount of resources that can be allocated for the job. The short QoS allows users to increase the number of resources for a job but reduces the maximum time to 1 day. ARC staff reserves the right to modify the QoS settings at any point of time to ensure a fair and balanced utilization of resources among all users.

Partition	QoS	Priority	MaxWall	MaxTRESPerUser	MaxTRESPerAccount	UsageFactor
normal_q	biomed_normal_base	1000	7-00:00:00	cpu=77,mem=717G	cpu=154,mem=1434G	1
normal_q	biomed_normal_long	500	14-00:00:00	cpu=20,mem=180G	cpu=39,mem=359G	1
normal_q	biomed_normal_short	2000	1-00:00:00	cpu=116,mem=27414G	cpu=231,mem=2151G	2
a100_normal_q	biomed_a100_normal_base	1000	7-00:00:00	cpu=13,mem=410G,gres/gpu=2	cpu=26,mem=820G,gres/gpu=4	1
a100_normal_q	biomed_a100_normal_long	500	14-00:00:00	cpu=4,mem=103G,gres/gpu=1	cpu=7,mem=205G,gres/gpu=1	1
a100_normal_q	biomed_a100_normal_short	2000	1-00:00:00	cpu=20,mem=615G,gres/gpu=3	cpu=39,mem=1229G,gres/gpu=5	2

Optimization

The performance of jobs can be greatly enhanced by appropriate optimizations being applied. Not only does this reduce the execution time of jobs but it also makes more efficient use of the resources for the benefit of all.

See the tuning guides available at https://developer.amd.com and https://www.intel.com/content/www/us/en/developer/

General principles of optimization:

Cache locality really matters - process pinning can make a big difference on performance.
Hybrid programming often pays off - one MPI process per L3 cache with 4 threads is often optimal.
Use the appropriate -march flag to optimize the compiled code and -gencode flag when using the NVCC compiler.

Suggested optimization parameters:

Node Type	CPU	GPU
CPU arch	Zen 2	Zen 2
Compiler flags	`-march=znver2`	`-march=znver2`
GPU arch	-	NVIDIA A100
Compute Capability	-	8.0
NVCC flags	-	`-gencode=arch=compute_80,code=sm_80`