Facilities, Equipment, and Other Resources (FER) Statement—Long Version
This page is for a longer version of FER.
For some submissions, you may be space-limited and a more concise version of FER may be desirable. This shorter version is available at the link from this page Useful PI links.
A longer version of the FER is provided here in various forms:
The contents of the documents above are identical. One form may be preferable to another in order to match document styles. The document contents also appear below.
Virginia Tech Computing Facilities, Equipment, and Operations
Prepared by
Advanced Research Computing, Virginia Tech
03 February 2025
1.0 Introduction
Advanced Research Computing (ARC) advances the use of computing and visualization in Virginia Tech (VT) research by providing a centralized organization that identifies, acquires, maintains, and upgrades/extends compute systems; and educates, supports, problem-solves, and collaborates with the VT community. As one example, ARC offers generous levels of no-cost resources to students and faculty, which supports the great majority of user needs. The ARC website is https://arc.vt.edu/. Our vision and mission statements are available at https://arc.vt.edu/content/arc_vt_edu/en/about.html. ARC resides in the Division of Information Technology (DoIT). Cybersecurity is implemented through the Information Technology Security Office (ITSO).
2.0 Compute Systems
2.1 Cluster Systems
VT’s main compute clusters are listed in Table 1. All clusters serve the entire VT community; the last cluster, CUI (controlled unclassified information) cluster, is restricted for use with controlled, unclassified data. TinkerCliffs (TC) is the flagship cluster, consisting of central processing unit (CPU) many-core nodes and graphics processing unit (GPU), i.e., accelerator, nodes. Owl came online in 2024 and is a CPU-based cluster that includes three large memory nodes with up to 8 TB of main memory each. In addition to the GPUs on TC, Infer and Falcon are GPU-based. Falcon also came online in 2024. The four clusters serving all of VT provide almost 54,000 cores, 442 GPUs, and over 216 TB of memory. The four main clusters—TC, Infer, Owl, and Falcon—share the same GPFS mount so that codes and data can be accessed from any cluster. All clusters use Infiniband (IB) interconnect.
Table 1. Summary of ARC compute clusters. Characteristics are given down the left-hand side and characteristic values for different ARC clusters are given in subsequent columns. The table is divided between conventional compute nodes and large memory compute nodes (see left column). “/” separates multiple entries within one cell.
Machine |
TinkerCliffs |
Infer |
Owl |
Falcon |
CUI |
---|---|---|---|---|---|
Conventional Compute |
Mixed: CPU / GPU / GPU |
GPU |
CPU |
GPU |
Mixed: CPU / GPU |
Vendor |
Cray / Nvidia DGX A100 / HPE Apollo 6500 |
HPE / Dell |
Lenovo |
NVIDIA A30 / NVIDIA L40S |
HPE / HPE Apollo |
Num. Nodes |
308 / 10 / 4 |
18 / 40 |
84 |
32 / 20 |
12 / 3 |
Cores/Node |
128 / 128 / 128 |
32 / 28 |
96 |
— / — |
64 / 64 |
GPUs/Node |
— / 8 / 8 |
1 / 2 |
— |
4 / 4 |
— / 8 |
Memory (GB)/Node |
256 / 2048 / 2048 |
192 / 512 |
768 |
24 per GPU / 48 per GPU |
512 / 2048 |
Large Memory Compute |
CPU |
— |
CPU |
— |
— |
Vendor |
Cray |
— |
Lenovo |
— |
— |
Num. Nodes |
8 |
— |
3 |
— |
— |
Cores/Node |
128 |
— |
128 |
— |
— |
Memory (GB)/Node |
1024 |
— |
4019 / 8038 / 8038 |
— |
– |
2.2 Visionarium
ARC’s Visionarium Lab provides an array of visualization resources, including the VisCube, an immersive 10-foot by 10-foot by 10-foot three-dimensional visualization environment, with head and device tracking. The ‘HyperCube’ remains the highest resolution and the most accessible immersive visualization lab on campus. The lab is available to graduates and undergraduates to work with their data and visualization solutions in a hardware and software environment with interactive 3D graphics and High-Performance Visualization (HPV). A new SuperCube will be operational in 2025, expanded to a 14.7-foot by 14.7-foot by 9.2-foot three-dimensional visualization environment, with 47.9 million pixels and 700 ft2 of display surface, returning the Visionarium to a world-class immersive visualization facility.
2.3 Access To Resources Beyond VT
ARC resources are able to leverage Virginia Tech’s excellent network connectivity, and network. Virginia offers access to advanced national networks, including ESnet, Internet2, and Mid Atlantic Crossroads.
3.0 Upgrades and Maintenance
Approximately every four months, the clusters are taken offline for two-to-four days for major upgrades. To the extent possible, hardware and software maintenance and expansion—which are continual efforts—are completed while the systems are online to minimize disruptions to users.
4.0 Leadership and Technical Support
4.1 ARC Leadership
Alberto Cano, VP, leads ARC. Other leadership team members are: Matthew Brown, Director of Research Computing Services; Mark Gardner, Network Research Manager; Jeremy Johnson, IT Operations Manager; and Nicholas Polys, Director of Visual Computing.
4.2 Technical Systems Support
The Systems Engineering Team architects, installs, maintains, and upgrades research system network, storage, and compute resources, and workload management. They maintain and implement system security practices and respond to alerts from monitoring/logging systems. These efforts include implementing and maintaining supporting infrastructure systems (e.g., cooling and power systems) and services. The team researches new/emerging technologies, including research on compute hardware and networks, and interacts with vendors to integrate hardware into ARC systems. They maintain user accounts and operate user-facing systems such as ColdFront and Open OnDemand, respectively.
4.3 Technical User Support
Technical user support occurs on two fronts. First, support is provided for general use of the clusters, user and system problems, and individual user or group needs, including proposal preparation (e.g., consultations). Second, system software and applications are maintained and new software is added based on user requirements. Both fronts are served by five Computational Scientists and four Graduate Research Assistants, who also conduct workshops for users. The group also hosts a bimonthly meeting for all VT students and faculty to describe new developments, bring up new issues for input from the user base, and answer questions. These meetings, among other outlets, guide ARC’s future equipment purchases, operations, and services.
5.0 Cybersecurity
The Information Technology Security Office (ITSO) oversees the security of the cyber infrastructure on campus under the direction of the Chief Information Security Officer.
5.1 InCommon Federation
Virginia Tech participates in the InCommon Federation as an Identity Provider, allowing individuals to use their Virginia Tech credentials to securely authenticate to services provided by InCommon Service Providers. Since InCommon is a federal Trust Framework Provider, Virginia Tech is an approved Credential Service Provider under the FICAM TFS Program.
5.2 Intrusion Detection
The ITSO deploys freeware and commercial intrusion detection sensors to monitor attacks against University computers. The data collected by these sensors is analyzed to identify and respond to cyber-attacks and to analyze malware. Virginia Tech takes a Continuous Monitoring security strategy that allows “hunting” for compromised machines and facilitates remediating events. Continuous Monitoring is a component of the Center for Internet Security’s (CIS) Controls architecture. The CIS Controls are a subset of the NIST 800-53 Priority 1 controls, NIST CSF, and NIST 800-171. One very useful capability is the ability to merge IDS data with GIS maps and building floor plans on campus to facilitate a quick and timely way to determine the scope of cyber-attacks against University computers.
5.3 Cyber Security Operations Center
Virginia Tech has implemented a Cyber Security Operations Center (SOC). This convergence of tools, data, and personnel into the SOC allows IT Security personnel to unite data from network sensors, provide analysis, and coordinate needed responses to further protect assets.
5.4 Centralized Logging
The Division of Information Technology provides a Central Log Service (CLS) to university departments for storing and analyzing logs, both for security and operational purposes.