Georgia Institute of Technology
Indiana University
Jetstream2 is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. The primary resource is a standard CPU resource consisting of AMD Milan 7713 CPUs with 128 cores per node and 512gb RAM per node connected by 100gbps ethernet to the spine.
Jetstream2 GPU is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. This particular portion of the resource is allocated separately from the primary resource and contains 360 NVIDIA A100 GPUs -- 4 GPUs per node, 128 AMD Milan cores, and 512gb RAM connected by 100gbps ethernet to the spine.
Jetstream2 LM is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. This particular portion of the resource is allocated separately from the primary resource and contains 32 nodes of GPU-ready 1TB RAM compute nodes, AMD Milan 7713 CPUs with 128 cores per node connected by 100gbps ethernet to the spine.
Institute for Advanced Computational Science at Stony Brook University
Ookami is a computer technology testbed supported by the National Science Foundation under grant OAC 1927880. It provides researchers with access to the A64FX processor developed by Riken and Fujitsu for the Japanese path to exascale computing and is deployed in the, until June 2022, fastest computer in the world, Fugaku. It is the first such computer outside of Japan. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. While being very power-efficient it supports a wide range of data types and enables both HPC and big data applications.
The Ookami HPE (formerly Cray) Apollo 80 system has 176 A64FX compute nodes each with 32GB of high-bandwidth memory and a 512 Gbyte SSD. This amounts to about 1.5M node hours per year. A high-performance Lustre filesystem provides about 0.8 Pbyte storage.
To facilitate users exploring current computer technologies and contrasting performance and programmability with the A64FX, Ookami also includes:
- 1 node with dual socket AMD Milan (64cores) with 512 Gbyte memory
- 2 nodes with dual socket Thunder X2 (64 cores) each with 256 Gbyte memory
- 1 node with dual socket Intel Skylake (32 cores) with 192 Gbyte memory and 2 NVIDIA V100 GPUs
Johns Hopkins University MARCC
National Center for Atmospheric Research
The Derecho supercomputer is a 19.87-petaflops HPE Cray EX cluster with 2,488 nodes, each with two 64-core AMD EPYC 7763 Milan processors, for a total of 323,712 processor cores. Each node has 256 GB DDR4 memory per node. The Derecho nodes are connected by an HPE Slingshot v11 high-speed interconnect in a dragonfly topology. The NSF National Center for Atmospheric Research (NCAR) operates the Derecho system to support Earth system science and related research by researchers at U.S. institutions.
The Derecho-GPU allocated resource is composed of 82 nodes, each with single-socket AMD Milan processors, 512 GB memory, 4 Nvidia A100 Tensor Core GPUs, connected by a 600 GB/s NVlink GPU interconnect, for a total of 328 A100 GPUs. Each A100 GPU has 40 GB HBM2 memory. The Derecho-GPU nodes each have four injection ports into Derecho's Slingshot interconnect. The NSF National Center for Atmospheric Research (NCAR) operates the Derecho system to support Earth system science and related research by researchers at U.S. institutions.
National Center for Supercomputing Applications
The Delta CPU resource comprises 124 dual-socket compute nodes for general purpose computation across a broad range of domains able to benefit from the scalar and multi-core performance provided by the CPUs, such as appropriately scaled weather and climate, hydrodynamics, astrophysics, and engineering modeling and simulation, and other domains using algorithms not yet adapted for the GPU. Each Delta CPU node is configured with 2 AMD EPYC 7763 (“Milan”) processors with 64-cores/socket (128-cores/node) at 2.45GHz and 256GB of DDR4-3200 RAM. An 800GB, NVMe solid-state disk is available for use as local scratch space during job execution. All Delta CPU compute nodes are interconnected to each other and to the Delta storage resource by a 100 Gb/sec HPE Slingshot network fabric.
The Delta GPU resource comprises 4 different node configurations intended to support accelerated computation across a broad range of domains such as soft-matter physics, molecular dynamics, replica-exchange molecular dynamics, machine learning, deep learning, natural language processing, textual analysis, visualization, ray tracing, and accelerated analysis of very large in-memory datasets. Delta is designed to support the transition of applications from CPU-only to using the GPU or hybrid CPU-GPU models. Delta GPU resource capacity is predominately provided by 200 single-socket nodes, each configured with 1 AMD EPYC 7763 (“Milan”) processors with 64-cores/socket (64-cores/node) at 2.55GHz and 256GB of DDR4-3200 RAM. Half of these single-socket GPU nodes (100 nodes) are configured with 4 NVIDIA A100 GPUs with 40GB HBM2 RAM and NVLink (400 total A100 GPUs); the remaining half (100 nodes) are configured with 4 NVIDIA A40 GPUs with 48GB GDDR6 RAM and PCIe 4.0 (400 total A40 GPUs). Rounding out the GPU resource is 6 additional “dense” GPU nodes, containing 8 GPUs each, in a dual-socket CPU configuration (128-cores per node) and 2TB of DDR4-3200 RAM but otherwise configured similarly to the single-socket GPU nodes. Within the “dense” GPU nodes, 5 nodes employ NVIDIA A100 GPUs (40 total A100 GPUs in “dense” configuration) and 1 node employs AMD MI100 GPUs (8 total MI100 GPUs) with 32GB HBM2 RAM. A 1.6TB, NVMe solid-state disk is available for use as local scratch space during job execution on each GPU node type. All Delta GPU compute nodes are interconnected to each other and to the Delta storage resource by a 100 Gb/sec HPE Slingshot network fabric.
Delta provides default storage allocations for all projects. For storage increases please open a ticket with NCSA providing a detailed justification for the size of the request and the duration of the need.
The DeltaAI resource comprises 114 NVIDIA quad Grace Hopper nodes interconnected by HPE's Slingshot interconnect. Each Grace Hopper node consists of four NVIDIA super chips with one ARM based CPU, 128 GB of LP-DDR5 RAM and one H100 GPU with 96GB of HBM. The four super chips are tightly coupled with NVLink and share a unified shared memory space.
The NCSA Granite Tape Archive is architected using a 19-frame Spectra TFinity tape library outfitted with 20 LTO-9 tape drives to enable a total capacity of over 170PB of accessible and replicated data, of which 3.6PB is currently available for ACCESS allocations. The additional capacity is reserved for other NCSA use, and additional space is still available to expand the archive when needed. The archive operates on Versity's ScoutAM/ScoutFS products giving users a single archive namespace from which to stage data in and out. Access to the Granite system is available directly via Globus and S3 tools.
Open Science Grid
A virtual HTCondor pool made up of resources from the Open Science Grid
A virtual HTCondor pool made up of resources from the Open Science Grid
Open Storage Network
The Open Storage Network (OSN) is an NSF-funded cloud storage resource, geographically distributed among several pods. OSN pods are currently hosted at SDSC, NCSA, MGHPCC, RENCI, and Johns Hopkins University. Each OSN pod currently hosts 1PB of storage, and is connected to R&E networks at 50 Gbps. OSN storage is allocated in buckets, and is accessed using S3 interfaces with tools like rclone, cyberduck, or the AWS cli.
Pittsburgh Supercomputing Center
Anton is a special purpose supercomputer for biomolecular simulation designed and constructed by D. E. Shaw Research (DESRES). PSC's current system is known as Anton 2 and is a successor to the original Anton 1 machine hosted here.
Anton 2, the next-generation Anton supercomputer, is a 128 node system, made available without cost by DESRES for non-commercial research use by US universities and other not-for-profit institutions, and is hosted by PSC with support from the NIH National Institute of General Medical Sciences. It replaced the original Anton 1 system in the fall of 2016.
Anton was designed to dramatically increase the speed of molecular dynamics (MD) simulations compared with the previous state of the art, allowing biomedical researchers to understand the motions and interactions of proteins and other biologically important molecules over much longer time periods than was previously accessible to computational study. The MD research community is using the Anton 2 machine at PSC to investigate important biological phenomena that due to their intrinsically long time scales have been outside the reach of even the most powerful general-purpose scientific computers. Application areas include biomolecular energy transformation, ion channel selectivity and gating, drug interactions with proteins and nucleic acids, protein folding and protein-membrane signaling.
Bridges-2 combines high-performance computing (HPC), high performance artificial intelligence (HPAI), and large-scale data management to support simulation and modeling, data analytics, community data, and complex workflows.
Bridges-2 Extreme Memory (EM) nodes enable memory-intensive genome sequence assembly, graph analytics, in-memory databases, statistics, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Bridges-2 Extreme Memory (EM) nodes each consist of 4 Intel Xeon Platinum 8260M “Cascade Lake” CPUs, 4TB of DDR4-2933 RAM, 7.68TB NVMe SSD. They are connected to Bridges-2's other compute nodes and its Ocean parallel filesystem and archive by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to read or write data from each EM node.
Bridges-2 combines high-performance computing (HPC), high performance artificial intelligence (HPAI), and large-scale data management to support simulation and modeling, data analytics, community data, and complex workflows.
Bridges-2 Accelerated GPU (GPU) nodes are optimized for scalable artificial intelligence (AI; deep learning). They are also available for accelerated simulation and modeling applications.
Each Bridges-2 GPU node contains 8 NVIDIA Tesla V100-32GB SXM2 GPUs, for aggregate performance of 1Pf/s mixed-precision tensor, 62.4Tf/s fp64, and 125Tf/s fp32, combined with 256GB of HBM2 memory per node to support training large models and big data.
Each NVIDIA Tesla V100-32GB SXM2 has 640 tensor cores that are specifically designed to accelerate deep learning, with peak performance of over 125Tf/s for mixed-precision tensor operations. In addition, 5,120 CUDA cores support broad GPU functionality, with peak floating-point performance of 7.8Tf/s fp64 and 15.7Tf/s fp32. 32GB of HBM2 (high-bandwidth memory) delivers 900 GB/s of memory bandwidth to each GPU. NVLink 2.0 interconnects the GPUs at 50GB/s per link, or 300GB/s per GPU.
Each Bridges-2 GPU node provides a total of 40,960 CUDA cores and 5,120 tensor cores per node. In addition, each node holds 2 Intel Xeon Gold 6248 CPUs; 512GB of DDR4-2933 RAM; and 7.68TB NVMe SSD.
The nodes are connected to Bridges-2's other compute nodes and its Ocean parallel filesystem and archive by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to enhance scalability of deep learning training.
Bridges-2 combines high-performance computing (HPC), high performance artificial intelligence (HPAI), and large-scale data management to support simulation and modeling, data analytics, community data, and complex workflows.
Bridges-2 Accelerated GPU (GPU) nodes are optimized for scalable artificial intelligence (AI; deep learning). They are also available for accelerated simulation and modeling applications.
Each Bridges-2 GPU node contains 8 NVIDIA Tesla V100-32GB SXM2 GPUs, for aggregate performance of 1Pf/s mixed-precision tensor, 62.4Tf/s fp64, and 125Tf/s fp32, combined with 256GB of HBM2 memory per node to support training large models and big data.
Each NVIDIA Tesla V100-32GB SXM2 has 640 tensor cores that are specifically designed to accelerate deep learning, with peak performance of over 125Tf/s for mixed-precision tensor operations. In addition, 5,120 CUDA cores support broad GPU functionality, with peak floating-point performance of 7.8Tf/s fp64 and 15.7Tf/s fp32. 32GB of HBM2 (high-bandwidth memory) delivers 900 GB/s of memory bandwidth to each GPU. NVLink 2.0 interconnects the GPUs at 50GB/s per link, or 300GB/s per GPU.
Each Bridges-2 GPU node provides a total of 40,960 CUDA cores and 5,120 tensor cores per node. In addition, each node holds 2 Intel Xeon Gold 6248 CPUs; 512GB of DDR4-2933 RAM; and 7.68TB NVMe SSD.
The nodes are connected to Bridges-2's other compute nodes and its Ocean parallel filesystem and archive by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to enhance scalability of deep learning training.
Bridges-2 combines high-performance computing (HPC), high performance artificial intelligence (HPAI), and large-scale data management to support simulation and modeling, data analytics, community data, and complex workflows.
Bridges-2 Regular Memory (RM) nodes provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. Each Bridges RM node consists of two AMD EPYC “Rome” 7742 64-core CPUs, 256-512GB of RAM, and 3.84TB NVMe SSD. 488 Bridges-2 RM nodes have 256GB RAM, and 16 have 512GB RAM for more memory-intensive applications (see also Bridges-2 Extreme Memory nodes, each of which has 4TB of RAM). Bridges-2 RM nodes are connected to other Bridges-2 compute nodes and its Ocean parallel filesystem and archive by HDR-200 InfiniBand.
The Bridges-2 Ocean data management system provides a unified, high-performance filesystem for active project data, archive, and resilience. Ocean consists of two tiers, disk and tape, transparently managed by HPE DMF as a single, highly usable namespace.
Ocean's disk subsystem, for active project data, is a high-performance, internally resilient Lustre parallel filesystem with 15PB of usable capacity, configured to deliver up to 129GB/s and 142GB/s of read and write bandwidth, respectively.
Ocean's tape subsystem, for archive and additional resilience, is a high-performance tape library with 7.2PB of uncompressed capacity (estimated 8.6PB compressed, with compression done transparently in hardware with no performance overhead), configured to deliver 50TB/hour.
Neocortex is a highly innovative resource that targets the acceleration of AI-powered scientific discovery by vastly shortening the time required for deep learning training, fostering greater integration of artificial deep learning with scientific workflows, and providing revolutionary new hardware for the development of more efficient algorithms for artificial intelligence and high performance computing.
The HPE Superdome Flex (SDFlex) features 32 Intel Xeon Platinum 8280L CPUs with 28 cores (56 threads) each, 2.70-4.0 GHz, 38.5 MB cache, 24 TiB RAM, aggregate memory bandwidth of 4.5 TB/s, and 204.6 TB aggregate local storage capacity with 150 GB/s read bandwidth. The SDF can provide 1.2 Tb/s to each CS-2 system and 1.6 Tb/s from the Bridges-2 filesystems.
Purdue University
Purdue's Anvil cluster built in partnership with Dell and AMD consists of 1,000 nodes with two 64-core AMD EPYC "Milan" processors each and delivers over 1 billion CPU core hours each year, with a peak performance of 5.1 petaflops. Each of these nodes has 256GB of DDR4-3200 memory. A separate set of 32 large memory nodes has 1TB of DDR4-3200 memory each. Anvil's nodes are interconnected with 100 Gbps Mellanox HDR100 InfiniBand.
16 nodes each with four NVIDIA A100 Tensor Core GPUs providing 1.5 PF of single-precision performance to support machine learning and artificial intelligence applications.
San Diego Supercomputer Center
Expanse is a Dell integrated compute cluster, with AMD Rome processors, 128 cores per node, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. The compute node section of Expanse has a peak performance of 3.373 PF. Full bisection bandwidth is available at rack level (56 compute nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and 3:1 oversubscription cross-rack. Compute nodes feature 1TB of NVMe storage and 256GB of DRAM per node. The system also features 12PB of Lustre based performance storage (140GB/s aggregate), and 7PB of Ceph based object storage.
Expanse is a Dell integrated compute cluster, with AMD Rome processors, NVIDIA V100 GPUs, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. The GPU component of Expanse features 52 GPU nodes, each containing four NVIDIA V100s (32 GB SMX2), connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. They feature 1.6TB of NVMe storage and 256GB of DRAM per node. There is HDR100 connectivity to each node. The system also features 12PB of Lustre based performance storage (140GB/s aggregate), and 7PB of Ceph based object storage.
5PB of storage on a Lustre based filesystem.
Voyager is a heterogeneous system designed to support complex deep learning AI workflows. The system features 42 Intel Habana Gaudi training nodes, each with 8 training processors (336 in total). Each training node has 512GB of memory and 6.4TB of node local NVMe storage. The Gaudi training processors feature specialized hardware units for AI, HBM2, and on-chip high-speed Ethernet. The on-chip ethernet ports are used in a non-blocking all-to-all network between processors on a node and the remaining ports are aggregated into 6 400G connections on each node that are plugged into a 400G Arista switch to provide scale out of network. Voyager also has two first-generation inference nodes, each with 8 inference processors (16 in total). In addition to the custom AI hardware, the system also has 36 Intel x86 processors compute nodes for general purpose computing and data processing. Voyager features 3PB of storage currently deployed as a Ceph filesystem.
Texas A&M University
ACES is a Dell cluster with a rich accelerator testbed consisting of Intel Max GPUs (Graphics Processing Units), Intel FPGAs (Field Programmable Gate Arrays), NVIDIA H100 and A30 GPUs, NEC Vector Engines, NextSilicon co-processors, Graphcore IPUs (Intelligence Processing Units). The ACES cluster consists of compute nodes using a mix of the following processors:
Intel Xeon 8468 Sapphire Rapids processors
Intel Xeon Ice Lake 8352Y processors
Intel Xeon Cascade Lake 8268 processors
AMD Epyc Rome 7742 processors
The compute nodes are interconnected with NVIDIA NDR200 connections for MPI and access to the Lustre storage. The Intel Optane SSDs and all accelerators (except the Graphcore IPUs and NEC Vector Engines) are accessed using Liqid's composable infrustructre via PCIe (Peripheral Component Interconnect express) Gen4 and Gen5 fabrics.
FASTER (Fostering Accelerated Scientific Transformations, Education and Research) is funded by the NSF MRI program (Award #2019129) and provides a composable high-performance data-analysis and computing instrument. The FASTER system has 180 compute nodes with 2 Intel 32-core Ice Lake processors and 256 GB RAM, and includes 240 NVIDIA GPUs (40 A100 and 200 T4 GPUs). Using LIQID’s composable technology, all 180 compute nodes have access to the pool of available GPUs, dramatically improving workflow scalability. FASTER will have HDR InfiniBand interconnection and access/share a 5PB usable high-performance storage system running Lustre filesystem. Thirty percent of FASTER’s computing resources are allocated to researchers nationwide through ACCESS.
Launch is a Dell Linux cluster with 45 compute nodes (8,640 cores) and 2 login nodes. There are 35 compute nodes with 384 GB memory and 10 GPU compute nodes with 768 GB memory and two NVIDIA A30s. The interconnecting fabric uses a single NVIDIA HDR100 InfiniBand switch.
Texas Advanced Computing Center
The Stampede2 Dell/Intel Knights Landing (KNL), Skylake (SKX) System provides the user community access to two Intel Xeon compute technologies.
The system is configured with 4204 Dell KNL compute nodes, each with a stand-alone Intel Xeon Phi Knights Landing bootable processor. Each KNL node includes 68 cores, 16GB MCDRAM, 96GB DDR-4 memory and a 200GB SSD drive.
Stampede2 also includes 1736 Intel Xeon Skylake (SKX) nodes and additional management nodes. Each SKX includes 48 cores, 192GB DDR-4 memory, and a 200GB SSD.
Allocations awarded on Stampede2 may be used on either or both of the node types.
Compute nodes have access to dedicated Lustre Parallel file systems totaling 28PB raw, provided by Cray. An Intel Omni-Path Architecture switch fabric connects the nodes and storage through a fat-tree topology with a point to point bandwidth of 100 Gb/s (unidirectional speed). 16 additional login and management servers complete the system. Stampede2 will deliver an estimated 18PF of peak performance.
Please see the Stampede2 User Guide for detailed information on the system and how to most effectively use it.
Stampede3 is generously funded through the National Science Foundation and is designed to serve today's researchers as well as support the research community on an evolutionary path toward many-core processors and accelerated technologies. Stampede 3 maintains the familiar programming model for all of today's users, and thus will be broadly useful for traditional simulation users, users performing data intensive computations, and emerging classes of new users.
TACC's High Performance Computing (HPC) systems are used primarily for scientific computing and while their disk systems are large, they are unable to store the long-term final data generated on these systems. The Ranch archive system fills this need for high capacity long-term storage, by providing a massive high performance file system and tape-based backing store designed, implemented, and supported specifically for archival purposes. <br/><br/>Ranch is a Quantum StorNext-based system, with a DDN- provided front-end disk system (30PB raw), and a 5000 slot Quantum Scalar i6000 library for its back-end tape archive. <br/><br>Ranch is an allocated resource, meaning that Ranch is available only to users with an allocation on one of TACC's computational resources such as Frontera, Stampede3, or Lonestar6. ACCESS PIs will be prompted automatically for the companion storage allocation as part of the proposal submission process and should include a justification of the storage needs in their proposal. The default allocation on Ranch for users is 2TB. To request a shared Ranch project space for your team's use, please submit a TACC Helpdesk ticket.
University of Delaware
Nodes with two AMD EPYC™ 7502 processors (32 cores each) with three memory size options:
48x standard 512 GiB;
32x large-memory 1024 GiB;
11x xlarge-memory 2048 GiB;
1x lg-swap 1024 GiB RAM + 2.73 TiB Intel Optane NVMe swap
3 nodes with two Intel® Xeon® Platinum 8260 processors (24 cores each), 768 GiB RAM, and 4 NVIDIA Tesla V100 32GB GPUs connected via NVLINK™
9 nodes with two AMD EPYC™ 7502 processors (32 cores each), 512 GiB RAM, and a single NVIDIA Tesla T4 GPU
1 node with two AMD EPYC™ 7502 processors (32 cores each), 512 GiB RAM, and a single AMD Radeon Instinct MI50 GPU
DARWIN's Lustre file system is for use with the DARWIN Compute and GPU nodes.
University of Kentucky
Five large memory compute nodes dedicated for XSEDE allocation. Each of these nodes have 40 cores (Broadwell class and lntel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz with 4 sockets, 10 cores/socket), 3TB RAM, and 6TB SSD storage drives. The 5 dedicated XSEDE nodes will have exclusive access to approximately 300 TB of network attached disk storage. All these compute nodes are interconnected through a 100 Gigabit Ethernet (l00GbE) backbone and the cluster login and data transfer nodes will be connected through a 100Gb uplink to lnternet2 for external connections.