ARC System Changes - January 2023
Notes and Guidance for January 2023 cluster changes
Changes to /home
on Cascades and Dragonstooth
HOME decoupled
While /home
has been “universal” across ARC clusters in recent history, the Cascades/Dragonstooth /home
is being decoupled from the others starting January 17, 2023.
Prior to the start of the January maintenance, the /home
filesystem was universal across Tinkercliffs, Dragonstooth, Cascades, and Infer cluster. This is because the same network-attached storage system was mounted on /home
on all the clusters.
During the maintenance outage, a larger, faster replacement system for this purpose was brought online to serve /home
for Tinkercliffs and Infer. Data was synchronized between the old and new systems to make the transition transparent for continued use of those systems.
Since they’re being decommsioned, Cascades and Dragonstooth remain on the previous /home
where the old data is still intact and are not connected to the new one.
As a result, any file actions (added files, removed files, changes to files) performed on Tinkercliffs/Infer /home
will not be reflected on the Cascades/Dragonstooth /home
directories. The converse is true as well: any file actions performed on Cascades/Dragonstooth will not be reflected on Tinkercliffs/Infer.
New policies for /fastscratch
Quota to be implemented on FASTSCRATCH
Starting in January 2023, quota limits on the usage of /fastscratch
will be put in place.
All ARC systems down for maintenance
During a maintenance outage in January 2023, the Cascades and Dragonstooth clusters will be decommissioned. This means that jobs will no longer be accepted or start on the compute nodes.
/work
and /groups
on Cascades and Dragonstooth will also be decommissioned in the following weeks.
The login nodes will remain accessible for a limited time (tentatively, for about 3 weeks or until February 7th) to allow people the opportunity to retrieve data from those systems.
A new backend storage system will come online to host /home
directories on all current mainline ARC systems (Tinkercliffs and Infer).
At the time of transition, all data from the previous system will be replicated to the new system. No user action is needed.
The previous storage system for HOME directories will still serve the Cascades and Dragonstooth login nodes while they remain online. Changes on these nodes will not be reflected in /home on Tinkercliffs/Infer or vice-versa.
Rationale
We would prefer to keep the other clusters online until the new resource is available, but these older systems have rapidly become a liability as
their compute nodes fail (25% loss at this point) and are unsupported by manufacturers anymore
storage has endured a startling number of component failures and replacements recently
their provisioning/configuration management/administration systems are defunct and
the software stacks are outdated (OS kernel,
glibc
, compilers, libraries, software deployment system).
To reduce the risk of catastrophic failure during operations and to align engineering time and effort toward new systems and services, these clusters are being taken offline.
A new CPU system in the works
As of December 2022, ARC is in the final phases of purchasing a new CPU system to replace these, but this new system is not likely to be available (due to acquisition, engineering, and testing timelines) before Summer 2023.
What is NOT directly affected?
The Tinkercliffs and Infer clusters and storage systems will resume normal operations in their current state after the end of the maintenance. The /projects
, /fastscratch
, and /home
storage on those systems will remain in operation.
Actions you may need to take
The 3-6 weeks after the mid-January maintenance will be available for people to migrate any data they need to keep from those storage systems.
A copy of all the /groups
directories was made to /projects
when Tinkercliffs was launched in fall 2020 and ARC will not make another bulk copy like this. ARC personnel are available to consult with PIs and labs as needed to assist with archiving older data sets and merging those in active use. We have a page here with information about data transfers
Cleanup data in /groups and /work
The hardware hosting the data which is currently stored in /groups
or /work
on Cascades is due to be decommissioned. If you need to preserve any files from those locations, please consider the following steps.
Please audit data before moving it
Please avoid making bulk data transfers from /groups
or /work
until you have thoroughly reviewed the data. ARC systems are not intended for indefinite, permanent storage and keeping old, unused files greatly increases the cost of the filesystems and can cause performance degradation.
Check to see if your data is already in
/projects
on Tinkercliffs or in some other storage repository.Delete any old, duplicate, or unneeded data and files.
Consolidate old results or data so that only the necessary elements are kept.
Package old results or data into larger, more managable files using
tar
and/orzip
utilites. An ideal file size for archival or transfer across networks is often beween 1GB and 100GB. Data sets which are smaller than 1GB or larger than 100GB will often be more cumbersome to work with.
tar vs. zip
tar
can package a directory tree into a single file, while zip
utilities compress files. Test your data for compressibility before attempting to zip it. Many modern data formats do not compress well.
Get Help
ARC personnel can assist with assessing and performing these steps. The best way to request such help is via a 4Help ticket or by attending ARC office hours.