ARC System Changes - May 2023
Notes and Guidance for May 2023 cluster changes
New SCRATCH filesystem
A new VAST flash-based storage system has been brought online to serve as a scratch storage location and is mounted at /globalscratch
. The “scratch” terminology indicates that it is for temporary staging of files for actively running jobs. Data will remain in place when a job ends, but files that were created or modified during a job need to be retained, then they must quickly be transferred to a persistent storage location such as /home
or /projects
.
90-day age limit enforced
Files which are older than 90 days (based on modification date) will be automatically purged on a continual basis with no prior notification. We suggest that you make every effort to move your files out within 1-2 days of the completion of a job in order to avoid data loss.
No quota enforcement
The /globalscratch
storage resource has been initially launched with no size-based quota enforcement. However, it has finite capacity and is a shared resource. We expect that the age-based limits will prevent it from filling up, but will be monitoring usage regularly to make sure it remains available to all users.
/fastscratch is deprecated
Pursuant to the release of /globalscratch
, the /fastscratch
file system is now deprecated. It has been consistently full since the fall of 2022 despite repeated requests from ARC personnel to individual users to remove data. It will remain temporarily accessible on Tinkercliffs so that people may retrieve any needed data from it, but will be removed no later than the August 2023 maintenance
Open On Demand
ARC production instance of Open OnDemand is now running a new version of OnDemand and is running on newer, more robust server.
Singularity modules updated to recent Apptainer release
Existing Singularity installations on ARC systems had were subject to a security advisory and needed to be updated. The community in general has moved from the Singularity name to Apptainer and the new modules reflect this. In additional, this provided an opportunity to get all the containerizations installations across ARC systems onto a common version. Some backward compatability is provided by Apptainer’s use of singularity
command wrappers and a similar module wrapper ARC has provided so that module load containers/singularity
will continue to work. Going forward, we will standardize on the Apptainer naming convention and suggest you update any scripts to do the same.
Nvidia drivers and CUDA upgrades
The Nvidia drivers on the
a100_normal_q
nodes have been upgraded from 470 to 520.CUDA 11.8 has been installed as the default system CUDA on all Tinkercliffs GPU nodes.
EasyBuild upgraded
The EasyBuild installations have been upgraded to the most recent version. In addition to providing the latest version of the software, it also provides updated EasyConfigs which should make installations of recent package versions … um … easier.
Storage expansion for /projects
The IBM ESS system which serves the GPFS /projects/
filesystem for Tinkercliffs and Infer has been expanded to provide more capacity for this storage type. This expansion should be transparent for users.
Slurm upgraded on Tinkercliffs
Slurm software has been upgraded on Tinkercliffs. This should be transparent to users.
OS and Other updates
For all Tinkercliffs nodes except DGX (which runs on a different OS), a full OS software update has been applied. In addition, Mellanox OFED has been upgraded to 5.7-1.0.2.0.
On Infer, all nodes had kernel update only (required for Apptainer 1.1.8), upgrade to Mellanox OFED 4.9-6.0.6.0 (due to kernel update).
Get Help
ARC personnel can assist with assessing and performing these steps. The best way to request such help is via a 4Help ticket or by attending ARC office hours.