OWL - Water-cooled AMD CPU
OWL came online in 2024 and has 91 nodes, 8,704 CPU cores, and 80 TB RAM.
The compute nodes on OWL are exclusively CPU-based; GPUs will be added in 2025.
Direct water-cooling of the base compute nodes allows for running at boost speeds (3.8GHz) indefinitely which is 40% higher than the base clock rate. Tinkercliffs base compute nodes run at 2.0GHz.
AMD’s Genoa architecture is the first to feature AVX-512 instructions which provides 512-bit width vectorization (ie. eight-way FP64 SIMD in each clock-cycle). Tinkercliffs base compute nodes support the previous generation AVX2 instructions which has 256-bit width.
12 memory channels per socket (24 per node) provide much higher aggregate memory bandwidth and increased granularity which should provide substantial speedup for memory-bandwidth constrained workload such as finite-element analysis.
DDR5-4800 memory provides a nominal 50% speed increase over DDR4-3200 on Tinkercliffs.
768GB memory per node provides 8GB memory per core compared to Tinkercliffs which has 2GB/core.
Three nodes are equipped with very-large memory (4TB or 8TB) enabling computational workloads for which we have never had sufficient memory resources.
The large memory nodes were not available in the AMD Genoa package at the time of acquisition and equipped with different processors (detail below) and are not water-cooled.
Overview
Base Compute Nodes |
Large Memory |
Huge Memory |
Totals |
|
---|---|---|---|---|
Vendor |
Lenovo |
Lenovo |
Lenovo |
|
Chip |
||||
Nodes |
84 |
2 |
1 |
87 |
Cores/Node |
96 |
128 |
128 |
|
Memory (GiB)/Node |
768 DDR5-4800 |
4,019 DDR4-3200 |
8,038 DDR4-3200 |
|
Local Disk |
2.9TB NVMe |
2.9TB NVMe |
2.9TB NVMe |
|
Interconnect |
shared 200Gbps HDR Infiniband: |
shared 200Gbps HDR Infiniband: |
shared 200Gbps HDR Infiniband: |
|
Total Memory |
64,512 |
8,038 |
8,038 |
|
Total Cores |
8,064 |
256 |
128 |
8,448 |
Theoretical Peak |
245.1456 TFLOPS |
Get Started
Owl can be accessed via one of the three login nodes:
owl1.arc.vt.edu
owl2.arc.vt.edu
owl3.arc.vt.edu
For testing purposes, all users will be alloted 240 core-hours each month in the “personal” allocation. Researchers at the PI level are able to request resource allocations in the “free” tier (usage fully subsidized by VT).
To do this, log in to the ARC allocation portal https://coldfront.arc.vt.edu,
select or create a project
click the “+ Request Resource Allocation” button
Choose the “Compute (Free) (Cluster)” allocation type
Policies
Limits are set on the scale and quantity of jobs at the user and allocation (Slurm account) levels to help ensure availability of resources to a broad set of researchers and applications. These are the limits applied to free tier usage (note that the terms “cpu” and “core” are used interchangably here following Slurm terminology):
Policies for Main Usage Queues/Partitions
The normal_q
, largemem_q
, and hugemem_q
are the partitions (queues) that handle the bulk of utilization on the Tinkercliffs cluster.
normal_q |
largemem_q |
hugemem_q |
|
---|---|---|---|
Node Type |
Base Compute |
Large Memory |
Huge Memory |
Number of Nodes |
84 |
2 |
1 |
MaxRunningJobs (User) |
32 |
2 |
2 |
MaxSubmitJobs (User) |
32 |
8 |
4 |
MaxRunningJobs (Allocation) |
64 |
8 |
4 |
MaxSubmitJobs (Allocation) |
200 |
16 |
8 |
MaxNodes (User) |
32 |
1 |
1 |
MaxNodes (Allocation) |
48 |
2 |
1 |
MaxCPUs (User) |
3,072 |
128 |
512 |
MaxCPUs (Allocation) |
4,608 |
256 |
768 |
MaxWallTime |
6 days |
3 days |
6 days |
Priority (QoS) |
1,000 |
1,000 |
1,000 |
Policies for Development and Alternative Usage Queues/Partitions
The “dev” partitions (queues) overlap the main usage queues above, but jobs in these queues get higher priority to allow more rapid access to resources for testing and development workloads. The tradeoff is that individuals may only run a small number of short jobs in these partitions.
dev_q |
preemptable_q |
interactive_q |
|
---|---|---|---|
Node Type |
Base Compute |
Base Compute |
Base Compute |
Number of Nodes |
84 |
84 |
4 |
MaxRunningJobs (User) |
2 |
32 |
2 |
MaxSubmitJobs (User) |
4 |
100 |
4 |
MaxRunningJobs (Allocation) |
8 |
64 |
3 |
MaxSubmitJobs (Allocation) |
16 |
200 |
6 |
MaxNodes (User) |
32 |
32 |
1 |
MaxNodes (Allocation) |
48 |
48 |
1 |
MaxCPUs (User) |
3,072 |
128 |
512 |
MaxCPUs (Allocation) |
4,608 |
256 |
768 |
MaxWallTime |
4 hours |
- |
4 hours |
Priority (QoS) |
2,000 |
0 |
1,000 |
AMD Resources
Compiler Options Quick Reference Guide
If you’re using EasyBuild to install software, loading the EasyBuild module we provide will set environment variable that EasyBuild will use to configure the Intel and GCC compilers architecture optimization flags.
Genoa (base nodes) |
Milan (largemem nodes) |
|
---|---|---|
Intel |
|
|
GCC |
|
|
Known Issues
Apptainer may experience issues on login node - use compute nodes instead
user.max_user_namespaces=0
is set as mitigation for a CVE on login nodes. Compute nodes are not affected and do not have this constraint.
Benchmarks
STREAM
HPL
HPCG
High performance conjugate gradient (HPCG) test results.
On Owl using gcc version 13.2.0 and OpenMPI version 4.1.6.
(This is the foss toolchain 2023b, i.e., module load foss/2023b
.)
Inputs: xdim=208, ydim=208, zdim=312, time=1800.
num MPI Processes |
total memory used (GB) |
execution time (s) |
execution rate (GFlops/s) |
---|---|---|---|
2 |
19.30 |
1,832.25 |
5.93 |
4 |
38.60 |
1,840.99 |
6.50 |
8 |
77.20 |
1,835.21 |
8.73 |
16 |
154.41 |
1,974.77 |
16.23 |
32 |
308.83 |
1,956.86 |
32.75 |
64 |
617.65 |
2,001.58 |
64.165 |
On Owl using gcc version 11.3.1 and MVAPICH2 MPI version 2.3.7.
(Using module mvapich2/gcc/64/2.3.7, i.e.,
module load mvapich2/gcc/64/2.3.7
.)
Inputs: xdim=208, ydim=208, zdim=312, time=1800.
These data under revision.
num MPI Processes |
total memory used (GB) |
execution time (s) |
execution rate (GFlops/s) |
---|---|---|---|
2 |
9.65 |
1,874.33 |
2.51 |
4 |
9.65 |
1,935.02 |
1.54 |
8 |
9.65 |
1,929.36 |
0.77 |
16 |
9.65 |
1,907.02 |
0.39 |
32 |
9.65 |
1,891.03 |
0.39 |
64 |
9.65 |
1,909.17 |
0.39 |
MPI
An MPI slurm script for running MPI using OpenMPI.
OpenMPI
#!/bin/bash
#SBATCH -J hpcg
## Wall time.
#SBATCH --time=2-04:00:00 # 2 days and 4 hours.
### Account. Your account number
#SBATCH --account=your_account_number
### Queue/partition.
#SBATCH --partition=normal_q
### This requests 1 node, 1 core.
#SBATCH --nodes=1
### Number of MPI ranks; total over all nodes.
#SBATCH --ntasks=2
### This is the number of MPI processes per node, for MPI jobs.
#SBATCH --ntasks-per-node=2
### Number of cores per task. Includes OpenMP,
### i.e., number of OpenMP threads per MPI process.
#SBATCH --cpus-per-task=6
## Might want to run exclusive for timing studies.
## Unless you have a good reason, comment this out;
## can waste resources.
#SBATCH --exclusive
## Slurm output and error files.
#SBATCH -o slurm.openmpi.hpcg.%j.out
#SBATCH -e slurm.openmpi.hpcg.%j.err
## Notify me when done.
#SBATCH --mail-type=ALL # Send email notification at the start and end of the job
#SBATCH --mail-user=your_vt_email # Send email notification to this address
# Load modules.
module load foss/2023b
## Exports.
export OMP_NUM_THREADS=4
## Time the job with time.
## For MVAPICH2, which we are using here:
## The following are variables, for user to specify: mycode, xdim, ydim, zdim, timedim.
time mpirun ${mycode} ${xdim} ${ydim} ${zdim} ${timedim}
mvapich2 MPI
An MPI slurm script for running MPI using MVAPICH2.
#!/bin/bash
#SBATCH -J hpcg;mvap2
## Wall time.
#SBATCH --time=0-02:00:00 # 2 hours
### Account. Your account number
#SBATCH --account=your_account_number
### Queue.
#SBATCH --partition=normal_q
### This requests 1 node.
#SBATCH --nodes=1
### Number of MPI ranks (i.e., processes); total over all nodes.
#SBATCH --ntasks=2
### This is the number of MPI processes per node, for MPI jobs.
#SBATCH --ntasks-per-node=2
### Number of cores per task. Includes OpenMP,
### i.e., number of OpenMP threads per MPI process.
#SBATCH --cpus-per-task=6
## Might want to run exclusive for timing studies.
## Unless you have a good reason, comment this out;
## can waste resources.
#SBATCH --exclusive
## Slurm output and error files.
#SBATCH -o slurm.hpcg.mvapich2.%j.out
#SBATCH -e slurm.hpcg.mvapich2.%j.err
# Load modules.
module load mvapich2/gcc/64/2.3.7
## Exports.
export OMP_NUM_THREADS=4
## Time the job with time.
## For MVAPICH2, which we are using here:
## The following are variables, for user to specify: mycode, xdim, ydim, zdim, timedim.
time srun ${mycode} ${xdim} ${ydim} ${zdim} ${timedim}