Best Practices for Academic Class Allocations

This page outlines general procedures and best practices for faculty who are teaching courses where students will use ARC systems as part of their coursework.

Instructional Allocations support academic classes and are managed by the faculty member or instructor responsible for the course. Instructional allocations are typically smaller, available for shorter time periods (e.g., for the duration of the associated course), and may be limited to a select set of systems.

Ask for a Consultation

ARC has computational scientists available to consult with you on the best practices below and may be able to provide customized advice to help for a smooth experience for your class on ARC systems.

They can:

consult the size and scale of needs for your class
review materials you will send to students to help identify any potential problems ahead of time
attend a session of your class to provide an ARC overview or collaborate on discussing specific details of research computing, HPC, software use, etc.

To request a consultation, put in a help request here, and provide a brief description of what you’re planning to do with the class on ARC systems.

Faculty Teaching Courses Involving AI on ARC Systems Faculty planning to teach courses in which students will work with AI models on ARC systems are strongly encouraged to submit a help request ahead of the semester to schedule an ARC consultation.

Connecting to ARC Using Open OnDemand(OOD)

We have found that Open OnDemand can reduce some of the “getting started” overhead with using ARC, often simply by making the initial login easier than ssh and importantly, the uniform look and feel for all students.

Depending on what kind of experience you want there are several routes you could take with OOD.

The “Desktop” interactive app would give a Linux desktop graphical interface and from there you could open a terminal, load ARC software modules and run either CLI or graphical interface apps. We have “bundle” modules for both R and python which include hundreds of the most common add-ons which can help minimizing the amount of downloading/installing people would have to do on their own.
We have Matlab, RStudio and Jupyter notebook apps which skip the whole desktop, Linux, and app loading steps and get you right into a GUI.
With some notice, we can help install software system-wide specifically for your course. This makes it easier for student to get straight to work and also eliminates redundant downloads and installations.

Secure Remote Development and File Management

VSCode + Remote-SSH add-on provides a great IDE-style experience for ARC remote systems. But avoid the following:

Enabling AI or dev-assist plugins on for ARC systems
Don’t execute intensive or long-running programs on the login nodes
Double check usage on login node through terminal: loginusage PID (%CPU >50%, most likely means VS Code is running something it shouldn’t be like running actual code or running AI extensions)
If you have issues with VS Code on ARC systems, run the following command on the login node (not through VSCode): rm -rf $HOME/.vscode-server

Use SSH keys to Simplify Connections for SSH, SCP, SFTP, and VSCode

Use the following tools to create and enable SSH keys:

ssh-keygen, ssh-copy-id (add public key to remote authorized_keys file)
ssh-agent, ssh-add, ~/.ssh/config (this file should contain your hosts)
For more details, please follow the instructions in this page.

Running Jobs on ARC Systems

While OnDemand is ideal for many interactive workflows of graphical user interface (GUI) applications, ARC clusters are built on a classical “batch computing” model.

Use Slurm commands to request resources such as sbatch / srun / salloc

Use interactive jobs for developing scripts and quick testing, but plan to move to batch jobs in the long run

Memory and CPU are decoupled, request only the CPUs you need and tune memory with --mem=X.
Use Short QoS for high priority (2x billing), see details here
Use seff <jobid> to assess utilization of complete jobs and right-size them.

Virtual Environments with Python on ARC

It’s highly recommended to build Virtual Environments to contain the Python modules you will need for your jobs. Please see this page for more details on how to Build using Miniforge/Miniconda or pip venv

Using Virtual Environments and Jupyter Notebooks in Open OnDemand: To use your virtual environment in Jupyter Notebooks or JupyterLab please check this sections conda with OOD and pip venv with OOD.

Some Issues to Avoid

ARC personnel monitor login nodes and idle resources to help make sure the clusters are running efficiently, remain available for the whole community, and that wait times are as short as possible.

Replicated Storage of Large Datasets

ARC clusters provide Central Storage for commonly used, large, open datasets, and can also add new datasets upon request.

Please don’t download datasets to $HOME, this will fill up their home and is redundant (e.g. 100 students all download a 200GB dataset to their homes).
If the dataset should be shared by the whole class, it can be stored in /projects/<class-allocation> or /scratch/<username> (change the permissions to grant access)
When transferring big datasets please use datatransfer.arc.vt.edu as your destination host, for example: scp -r dirname myVTpid@datatransfer.arc.vt.edu:/projects/mygroup/. This page provides more details about data transfer tools.

Misuse of Login Nodes

A common mistake for a new user is to run codes or computations on the login node. Login nodes are the shared entry point to the clusters, but any intensive or long-running programs should be executed as part of a job. If this happens, we may email them to address the situation and kill any running tasks to relieve the load on the login node. In some cases, a student’s code will consume their whole quota and prevent them from being able to log back into the login node.

Wasteful Jobs

Please make sure to request only the needed compute resources and run your job on a compute node as illustrated here.

Please request only the CPUs you need and tune memory with --mem=X.
Scale up to multiple nodes or GPUs only when you know that your code can use them.
Don’t request more than one node if your code is serial and can run on one node. Some workloads consist of running many small instances with differing input or parameters and can be parallelized. Here is a starter for parallelization with Slurm and GNU parallel.
GPUs are limited and valuable resources, so please only request the needed number of GPUs and make sure that your job is using them. There is high demand for these resources and so it’s very wasteful for allocated resources to sit idle. This also exacerbates the wait times for the next jobs.
You can check if you are using a lot of resources on the login node by typing this command while you are on the login node: loginusage
From login node you can monitor your jobs: seff <jobid> (completed jobs), showjobusage <jobid> (host-level resource usage)
When you are connected to a compute node, you can use: htop, ps, nvidia-smi, or gpumon to monitor your usage.
Also you can check all clusters utilization online with ARC Dashboards.

Waiting for Busy Resources

ARC resources show consistently high utilization year-round. Most jobs start very quickly, but larger jobs or those which require scarce resources may have to wait hours or even days to start.

Avoid making assignments which use ARC systems due at the end of the semester.
Make sure students are aware that they may have to wait for their jobs to start and account for that when planning their work.