TinkerCliffs, ARC’s Flagship Cluster
Overview
TinkerCliffs has 353 nodes, 44,224 CPU cores, 133 TB RAM, 112 NVIDIA A100 and 56 NVIDIA H200 GPUs. TinkerCliffs hardware is summarized in the table below.
Node Type |
Base Compute Nodes |
Intel Nodes |
High Memory Nodes |
DGX A100 GPU Nodes |
A100 GPU Nodes |
H200 GPU Nodes |
Total |
---|---|---|---|---|---|---|---|
Chip |
- |
||||||
Architecture |
Zen 2 |
Cascade Lake |
Zen 2 |
Zen 2 |
Zen 2 |
Emerald Rapids |
- |
Slurm features |
amd |
intel, avx512 |
amd |
dgx-A100 |
hpe-A100 |
- |
- |
Nodes |
308 |
16 |
8 |
10 |
4 |
7 |
353 |
GPUs |
- |
- |
- |
8x NVIDIA A100-80G |
8x NVIDIA A100-80G |
8x NVIDIA H200-141G |
168 |
Cores/Node |
128 |
96 |
128 |
128 |
128 |
64 |
- |
Memory (GB)/Node |
256 |
384 |
1,024 |
2,048 |
2,048 |
2,048 |
- |
Total Cores |
39,424 |
1,536 |
1,024 |
1,280 |
512 |
448 |
44,224 |
Total Memory (GB) |
78,848 |
6,144 |
8,192 |
20,480 |
8,192 |
14,336 |
136,192 |
Local Disk |
480GB SSD |
3.2TB NVMe |
480GB SSD |
30TB Gen4 NVMe |
11.7TB NVMe |
28 TB NVMe |
- |
Interconnect |
HDR-100 IB |
HDR-100 IB |
HDR-100 IB |
8x HDR-200 IB |
4x HDR-200 IB |
8x HDR-200 IB |
- |
Tinkercliffs is hosted in the Steger Hall HPC datacenter on the Virginia Tech campus, so it is physically separated from other ARC HPC systems which are hosted in the AISB Datacenter at the Corporate Research Center (CRC) in Blacksburg.
An IBM ESS GPFS file system supports /projects
for group collaboration and a VAST /scratch
serves high-performance input/output (I/O).
Get Started
Tinkercliffs can be accessed via one of the two login nodes using your VT credentials:
tinkercliffs1.arc.vt.edu
tinkercliffs2.arc.vt.edu
For testing purposes, all users will be alloted an initial 240 core-hours for 90 days in the “personal” allocation. Researchers at the PI level are able to request resource allocations in the “free” tier (usage fully subsidized by VT) and can allocate 1,000,000 monthly Service Units among their projects.
To create an allocation, log in to the ARC allocation portal https://coldfront.arc.vt.edu
select or create a project
click the “+ Request Resource Allocation” button
Choose the “Compute (Free) (Cluster)” allocation type
Usage needs in excess of 1,000,000 monthly Service Units can be purchased via the ARC Cost Center.
Partitions
Users submit jobs to partitions of the cluster depending on the type of resources (CPUs or GPUs) needed. Features are optional restrictions users can indicate in their job submission to restrict the execution of their job to nodes meeting specific requirements. If users do not specify the amount of memory requested for a job, the parameter DefMemPerCPU will automatically determine the amount of memory for the job based on the number of CPU cores requested. If the users do not specify the number of CPU cores on a GPU job, the parameter DepCpuPerGPU will automatically determine the number of CPU cores based on the number of GPUs requested. Jobs will be billed against the user’s allocation accounting for the utilization of number of CPU cores, memory, and GPU time. Consult the Slurm configuration to understand how to specify the parameters for your job.
Partition |
normal_q |
preemptable_q |
a100_normal_q |
a100_preemptable_q |
h200_normal_q |
h200_preemptable_q |
---|---|---|---|---|---|---|
Node Type |
Base Compute, Intel, High Memory |
Base Compute, Intel, High Memory |
DGX A100 GPU, A100 GPU |
DGX A100 GPU, A100 GPU |
H200 GPU |
H200 GPU |
Features |
amd,intel,avx512 |
amd,intel,avx512 |
hpe-A100,dgx-A100 |
hpe-A100,dgx-A100 |
- |
- |
Number of Nodes |
332 |
332 |
14 |
14 |
7 |
7 |
DefMemPerCPU (MB) |
1944 |
1944 |
16056 |
16056 |
32112 |
32112 |
DefCpuPerGPU |
- |
- |
8 |
8 |
4 |
4 |
TRESBillingWeights |
CPU=1.0,Mem=0.0625G |
- |
CPU=1.0,Mem=0.0625G,GRES/gpu=100.0 |
- |
CPU=1.0,Mem=0.0625G,GRES/gpu=150 |
- |
PreemptMode |
OFF |
ON |
OFF |
ON |
OFF |
ON |
Quality of Service (QoS)
The QOS associated with a job will affect the job in three key ways: scheduling priority, resource limits, and time limits. Each partition has a defaulq QoS named partitionname_base with a default priority, resource limits, and time limits. Users can optionally select a different QoS to increase or decrease the priority, resource limits, and time limits. The goal is to offer users multiple flexible options that adjust to their jobs needs. The long QoS allows users to run for an extended period of time (up to 14 days) but reduces the total amount of resources that can be allocated for the job. The short QoS allows users to increase the number of resources for a job but reduces the maximum time to 1 day. ARC staff reserves the right to modify the QoS settings at any point of time to ensure a fair and balanced utilization of resources among all users.
Partition |
QoS |
Priority |
MaxWall |
MaxTRESPerUser |
MaxTRESPerAccount |
---|---|---|---|---|---|
normal_q |
tc_normal_base |
1000 |
7 days |
cpu=8397,mem=18276G |
cpu=16794,mem=36552G |
normal_q |
tc_normal_long |
500 |
14 days |
cpu=2100,mem=4569G |
cpu=4199,mem=9138G |
normal_q |
tc_normal_short |
2000 |
1 day |
cpu=12596,mem=27414G |
cpu=25191,mem=54828G |
preemptable_q |
tc_preemptable_base |
0 |
30 days |
cpu=1050,mem=2285G |
cpu=2100,mem=4569G |
a100_normal_q |
tc_a100_normal_base |
1000 |
7 days |
cpu=359,mem=5642G,gres/gpu=23 |
cpu=717,mem=11284G,gres/gpu=45 |
a100_normal_q |
tc_a100_normal_long |
500 |
14 days |
cpu=90,mem=1411G,gres/gpu=6 |
cpu=180,mem=2821G,gres/gpu=12 |
a100_normal_q |
tc_a100_normal_short |
2000 |
1 day |
cpu=538,mem=8463G,gres/gpu=34 |
cpu=1076,mem=16926G,gres/gpu=68 |
a100_preemptable_q |
tc_a100_preemptable_base |
0 |
30 days |
cpu=45,mem=706G,gres/gpu=3 |
cpu=90,mem=1411G,gres/gpu=6 |
h200_normal_q |
tc_h200_normal_base |
1000 |
7 days |
cpu=90,mem=2868G,gres/gpu=12 |
cpu=180,mem=5735G,gres/gpu=23 |
h200_normal_q |
tc_h200_normal_long |
500 |
14 days |
cpu=23,mem=717G,gres/gpu=3 |
cpu=45,mem=1434G,gres/gpu=6 |
h200_normal_q |
tc_h200_normal_short |
2000 |
1 days |
cpu=135,mem=4301G,gres/gpu=17 |
cpu=269,mem=8602G,gres/gpu=34 |
h200_preemptable_q |
tc_h200_preemptable_base |
0 |
30 days |
cpu=12,mem=359G,gres/gpu=2 |
cpu=23,mem=717G,gres/gpu=3 |
Optimization
Node Type |
Base Compute Nodes |
Intel Nodes |
High Memory Nodes |
DGX A100 GPU Nodes |
A100 GPU Nodes |
H200 GPU Nodes |
---|---|---|---|---|---|---|
CPU arch |
Zen 2 |
Cascade Lake |
Zen 2 |
Zen 2 |
Zen 2 |
Emerald Rapids |
Compiler flags |
|
|
|
|
|
|
GPU arch |
- |
- |
- |
NVIDIA A100 |
NVIDIA A100 |
NVIDIA H200 |
Compute Capability |
- |
- |
- |
8.0 |
8.0 |
9.0 |
NVCC flags |
- |
- |
- |
|
|
|
See the tuning guides available at https://developer.amd.com and https://www.intel.com/content/www/us/en/developer/
Cache locality really matters - process pinning can make a big difference on performance.
Hybrid programming often pays off - one MPI process per L3 cache with 4 threads is often optimal.
Use the appropritate
-march
flag to optimize the compiled code and-gencode
flag when using the NVCC compiler.