Acceptable Use Policy

All users must follow these guidelines to ensure fair, efficient, and reliable high-performance computing service for the community. By using any ARC resources, you agree to comply with this policy. Failure to follow these rules may result in job termination and account suspension.

Resource Efficiency and Job Scaling

Users must request only the compute resources (CPU cores, GPUs, memory) that their jobs actually need. Do not request more than you need. Excessive requests (idle cores, unused GPUs, or large memory allocations) waste shared resources and increase the wait times for all users to run their jobs. In particular:

  • Match resources to workload: Only request the number of CPU cores, amount of memory, and number/type of GPUs that your application can use. Unused resources sit idle and slow down the queue.

  • Avoid over-allocation: If your job runs on fewer cores or less memory than requested, you should reduce your request. For example, requesting 16 cores when you only use 1 will always make you wait longer for your job and block other users.

  • Scale up gradually: Start with smaller test jobs to determine how your application scales, then increase resources only as needed. Test with a few nodes or GPUs first, measure performance (CPU/GPU utilization, memory use), and adjust. Many applications do not scale linearly, so using twice as many cores may not halve the runtime.

  • Monitor usage: Use job profiling tools or logs (e.g. Slurm’s seff, or in-job monitoring tools) to check CPU, GPU, and memory utilization. If you consistently underutilize resources, reduce your requests. Over time, tune your job scripts for efficiency (e.g. optimal MPI ranks per node, appropriate thread counts).

Login Node Usage

The login nodes are not meant for running analysis or parallel jobs. They are shared gateways for all users. Only lightweight tasks are allowed on the login nodes (e.g. job preparation). In particular:

  • Allowed activities: Editing code and text files, compiling software (with limited threads), staging data (transfers in/out), and submitting or monitoring batch jobs. These tasks help prepare and manage jobs on the compute nodes.

  • Prohibited activities: Do not run CPU- or memory-intensive computations, GPU jobs, large I/O tasks, parallel/MPI jobs, or interactive data analysis on login nodes. Even a single core-hungry process or large-memory task on a login node can degrade performance for everyone. Such jobs may be automatically terminated without warning.

Reminder: Use the batch scheduler for all actual computations. If you need an interactive command line on a compute node (e.g. for debugging or visualization), submit an interactive job (interact) or use Open OnDemand. Never treat login nodes as mini compute nodes; any heavy job on a login node will violate this policy. Do not abuse Visual Studio Remote Development extensions to run programs on the login node.

Interactive Sessions

Interactive sessions on compute nodes are provided for testing and debugging, but they must be used judiciously:

  • Purpose: Use interactive jobs only when you need to compile software, test commands, debug issues, or perform short-term visualization. These sessions run on compute nodes (not login nodes) and should be set with appropriate time and resource limits.

  • Do not idle: Do not start an interactive session and then leave it unattended. If the session is idle or underutilized, it will be considered abuse. Nodes left idle by long-running interactive jobs may be terminated at any time. Always actively end your session when you are done. When using Open OnDemand make sure to click on the “Delete” button to end your session.

  • Session limits: Be aware of any imposed time limits on interactive jobs. Do not exceed them. If you need a longer interactive run for data analysis or visualization, consider using batch jobs.

Idle or abandoned interactive sessions waste resources. The scheduler and ARC staff may automatically cancel sessions that show no user activity, so plan accordingly and save your work frequently.

Queue Etiquette and Scheduler Fairness

The cluster scheduler enforces fair-share policies so that all users get equitable access. You must not try to subvert this system. In particular:

  • No priority gaming: Do not submit unnecessary or dummy jobs to hold a place in the queue or to improve your job priority. This includes “frontrunning” tricks like repeatedly submitting and canceling jobs to bump your priority. Such manipulations violate fair-use principles and are prohibited.

  • Limit job spamming: Avoid flooding the queue with many tiny jobs. A huge number of very short jobs creates overhead and can clog the scheduler. If you have many small independent tasks, use job arrays or bundle tasks into larger jobs where possible. The scheduler favors fewer longer jobs over many short jobs.

  • Trust the scheduler: Do not attempt to reserve resources (like by artificially occupying nodes) outside of normal job submission. Use the provided partition/QoS correctly and follow any group or project limits.

Being courteous with submissions helps everyone. Follow any posted queue rules (maximum jobs, time limits, etc.), and ask ARC stsaff if you need advice on packaging large workloads. Violating scheduler rules or trying to game the system may lead to account suspension.

Scratch Storage Policy

Each job has access to a scratch space for temporary data. Scratch is not permanent or backed up. Treat it as a cache or working directory, not a long-term archive. In detail:

  • Short-term only: Files on scratch are automatically purged after a set period (typically 90 days of inactivity). There is no backup of scratch storage, so any files older than the purge threshold will be deleted without recovery. Plan accordingly.

  • No long-term storage: Do not use scratch to keep important or archival data. Attempts to circumvent purging (for example, by renaming files or moving them outside your scratch directory) are not allowed. If you need data beyond your job’s lifetime, copy it to your home directory, group project space, or an archival system.

  • Regular cleanup: Periodically remove unneeded files from scratch. Keep only what is needed for your active job. Remember that all scratch data can disappear after a purge, so copy out valuable results promptly.

In summary, use scratch for intermediate data and computations. Always assume that scratch data may be lost, and maintain your own copies of critical files elsewhere.

Enforcement and Consequences

ARC staff will enforce these policies to protect the cluster and the correct utilization of resources. Violations have immediate consequences:

  • Job termination: Any job found violating these rules (excessive resource use, running on login nodes, idle interactive sessions, etc.) may be killed without warning. Re-submission of problematic jobs can lead to repeated termination.

  • Warnings and suspension: Users who repeatedly abuse resources or ignore guidelines will receive formal warnings. Continued misuse will result in suspension of access and accounts. In severe cases, the user’s account may be disabled or permanently revoked.

  • Loss of privilege: Persistent failure to follow this policy can lead to forfeiting all cluster usage privileges. The user’s faculty sponsor or PI may also be notified of any major infractions.

By using the Virginia Tech HPC clusters, you agree to these rules. Thank you for helping to keep our HPC resources fair, efficient, and reliable for everyone.