ACCESS Active Resources

Resource Actions

Request Allocations View Documentation Related Training Related Metrics

CloudBank

Description

CloudBank Classroom is a cloud-hosted, small-scale JupyterHub designed for teaching, built on an open-source interactive computing stack (Python, R, JupyterLab, and related libraries). It provides each learner with a modest allocation of persistent storage, memory, and CPUs as well as seamless authentication through their university’s identity management system, allowing them to log in with existing campus credentials. The environment comes preconfigured with widely used data science tools, enabling students to run code, analyze data, and complete assignments directly in the browser without local setup. Instructors can easily distribute materials, manage assignments, and support learners through integrated tools like nbgitpuller and grading extensions. By removing technical barriers, ensuring equity of access, and offering a consistent, reproducible workflow, the JupyterHub lowers the cost of teaching with data while remaining scalable to support classes of varying sizes.

CloudBank Classroom

CloudBank Cloud Resource Provider

Description

CloudBank (available to early access users) enables researchers to access commercial cloud platforms using ACCESS credits, which are converted to dollars and usable across the following commercial cloud platforms: Amazon Web Services, Google Cloud, IBM Cloud, and Microsoft Azure. After onboarding, the CloudBank portal automatically creates cloud accounts for the platforms listed in your ACCESS transfer request, with the option to add more later. In the CloudBank Portal, you can manage which authorized ACCESS users have permission to your cloud accounts and easily access the cloud vendor platforms using federated login with your ACCESS credentials. As you use cloud services, CloudBank tracks your usage, deducts costs from your allocation, and sends automated spend notifications at key milestones. CloudBank supports a limited number of projects needing access to regulated data through Sherlock services.

CloudBank Cloud Resource Provider

CloudBank Research

Description

CloudBank enables researchers to access commercial cloud platforms through the ACCESS and NAIRR Pilot allocation processes. Awards are made in dollars and usable across the following commercial cloud platforms: Amazon Web Services, Google Cloud, IBM Cloud, and Microsoft Azure. After onboarding, the CloudBank portal automatically creates cloud accounts for the platforms listed in your request, with the option to add more later. In the CloudBank Portal, you can manage which authorized users have permission to your cloud accounts and easily access the cloud vendor platforms using federated login with your ACCESS credentials. CloudBank continuously monitors resource usage, applies charges against your allocation, and delivers automated spend alerts. Regulated data is also supported on CloudBank. Currently, CloudBank is available to early-access users only. CloudBank expects to be available for general ACCESS and NAIRR Pilot users in December.

CloudBank Research

Georgia Institute of Technology

GaTech Hive Cluster

Indiana University

Indiana Jetstream2 CPU

Description

Jetstream2 is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. The primary resource is a standard CPU resource consisting of AMD Milan 7713 CPUs with 128 cores per node and 512gb RAM per node connected by 100gbps ethernet to the spine.

Indiana Jetstream2 CPU

Indiana Jetstream2 GPU

Description

Jetstream2 GPU is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. This particular portion of the resource is allocated separately from the primary resource and contains 360 NVIDIA A100 GPUs -- 4 GPUs per node, 128 AMD Milan cores, and 512gb RAM connected by 100gbps ethernet to the spine.

Indiana Jetstream2 GPU

Indiana Jetstream2 Large Memory

Description

Jetstream2 LM is a hybrid-cloud platform that provides flexible, on-demand, programmable cyberinfrastructure tools ranging from interactive virtual machine services to a variety of infrastructure and orchestration services for research and education. This particular portion of the resource is allocated separately from the primary resource and contains 32 nodes of GPU-ready 1TB RAM compute nodes, AMD Milan 7713 CPUs with 128 cores per node connected by 100gbps ethernet to the spine.

Indiana Jetstream2 Large Memory

Indiana Jetstream2 Storage

Institute for Advanced Computational Science at Stony Brook University

IACS at Stony Brook Ookami

Description

Ookami is a computer technology testbed supported by the National Science Foundation under grant OAC 1927880. It provides researchers with access to the A64FX processor developed by Riken and Fujitsu for the Japanese path to exascale computing and is deployed in the, until June 2022, fastest computer in the world, Fugaku. It is the first such computer outside of Japan. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. While being very power-efficient it supports a wide range of data types and enables both HPC and big data applications.

The Ookami HPE (formerly Cray) Apollo 80 system has 176 A64FX compute nodes each with 32GB of high-bandwidth memory and a 512 Gbyte SSD. This amounts to about 1.5M node hours per year. A high-performance Lustre filesystem provides about 0.8 Pbyte storage.

To facilitate users exploring current computer technologies and contrasting performance and programmability with the A64FX, Ookami also includes:

- 1 node with dual socket AMD Milan (64cores) with 512 Gbyte memory
- 2 nodes with dual socket Thunder X2 (64 cores) each with 256 Gbyte memory
- 1 node with dual socket Intel Skylake (32 cores) with 192 Gbyte memory and 2 NVIDIA V100 GPUs

IACS at Stony Brook Ookami

Johns Hopkins University MARCC

Rockfish cluster at Johns Hopkins University - GPU nodes

Rockfish cluster at Johns Hopkins University - Large Memory nodes

Rockfish cluster at Johns Hopkins University - Regular Memory nodes

National Center for Supercomputing Applications

NCSA Delta CPU (Delta CPU)

Description

The Delta CPU resource comprises 124 dual-socket compute nodes for general purpose computation across a broad range of domains able to benefit from the scalar and multi-core performance provided by the CPUs, such as appropriately scaled weather and climate, hydrodynamics, astrophysics, and engineering modeling and simulation, and other domains using algorithms not yet adapted for the GPU. Each Delta CPU node is configured with 2 AMD EPYC 7763 (“Milan”) processors with 64-cores/socket (128-cores/node) at 2.45GHz and 256GB of DDR4-3200 RAM. An 800GB, NVMe solid-state disk is available for use as local scratch space during job execution. All Delta CPU compute nodes are interconnected to each other and to the Delta storage resource by a 100 Gb/sec HPE Slingshot network fabric.

NCSA Delta CPU (Delta CPU)

NCSA Delta GPU (Delta GPU)

Description

The Delta GPU resource comprises 5 different node configurations intended to support accelerated computation across a broad range of domains including traditional simulation and AI/ML work. Delta is designed to support the transition of applications from CPU-only to using the GPU or hybrid CPU-GPU models. Delta GPU resource capacity is predominately provided by 200 single-socket nodes, each configured with 1 AMD EPYC 7763 (“Milan”) processors with 64-cores/socket (64-cores/node) at 2.55GHz and 256GB of DDR4-3200 RAM. Half of these single-socket GPU nodes (100 nodes) are configured with 4 NVIDIA A100 GPUs with 40GB HBM2 RAM and NVLink (400 total A100 GPUs); the remaining half (100 nodes) are configured with 4 NVIDIA A40 GPUs with 48GB GDDR6 RAM and PCIe 4.0 (400 total A40 GPUs). Rounding out the GPU resource is 14 additional “dense” GPU nodes, containing 8 GPUs each, in a dual-socket CPU configuration (128-cores per node) and 2TB of DDR4-3200 RAM but otherwise configured similarly to the single-socket GPU nodes. Within the “dense” GPU nodes, 5 nodes employ NVIDIA A100 GPUs (40 total A100 GPUs in “dense” configuration) and 1 node employs AMD MI100 GPUs (8 total MI100 GPUs) with 32GB HBM2 RAM. A 1.6TB, NVMe solid-state disk is available for use as local scratch space during job execution on each GPU node type. Finally 8 of the dense GPU nodes have NVIDIA H200 GPUs with 141GB of HBM each. All Delta GPU compute nodes are interconnected to each other and to the Delta storage resource by a 200 Gb/sec HPE Slingshot network fabric. One Delta GPU SU is equal to one A100 GPU hour in the standard quad A100 partition. Other node types have charge factors that reflect their relative cost with H200s costing 3x an A100 GPU hour.

NCSA Delta GPU (Delta GPU)

NCSA Delta Storage (Delta Storage)

Description

Delta provides default storage allocations for all projects. For storage increases please open a ticket with NCSA providing a detailed justification for the size of the request and the duration of the need.

NCSA Delta Storage (Delta Storage)

NCSA DeltaAI

Description

The DeltaAI resource comprises 114 NVIDIA quad Grace Hopper nodes interconnected by HPE's Slingshot interconnect. Each Grace Hopper node consists of four NVIDIA super chips with one ARM based CPU, 128 GB of LP-DDR5 RAM and one H100 GPU with 96GB of HBM. The four super chips are tightly coupled with NVLink and share a unified shared memory space.

NCSA DeltaAI

NCSA Granite Tape Archive System

Description

The NCSA Granite Tape Archive is architected using a 19-frame Spectra TFinity tape library outfitted with 20 LTO-9 tape drives to enable a total capacity of over 170PB of accessible and replicated data, of which 3.6PB is currently available for ACCESS allocations. The additional capacity is reserved for other NCSA use, and additional space is still available to expand the archive when needed. The archive operates on Versity's ScoutAM/ScoutFS products giving users a single archive namespace from which to stage data in and out. Access to the Granite system is available directly via Globus and S3 tools.

NCSA Granite Tape Archive System

NSF National Center for Atmospheric Research

Derecho

Description

The Derecho supercomputer is a 19.87-petaflops HPE Cray EX cluster with 2,488 nodes, each with two 64-core AMD EPYC 7763 Milan processors, for a total of 323,712 processor cores. Each node has 256 GB DDR4 memory per node. The Derecho nodes are connected by an HPE Slingshot v11 high-speed interconnect in a dragonfly topology. The NSF National Center for Atmospheric Research (NCAR) operates the Derecho system to support Earth system science and related research by researchers at U.S. institutions.

Derecho

Derecho-GPU

Description

The Derecho-GPU allocated resource is composed of 82 nodes, each with single-socket AMD Milan processors, 512 GB memory, 4 Nvidia A100 Tensor Core GPUs, connected by a 600 GB/s NVlink GPU interconnect, for a total of 328 A100 GPUs. Each A100 GPU has 40 GB HBM2 memory. The Derecho-GPU nodes each have four injection ports into Derecho's Slingshot interconnect. The NSF National Center for Atmospheric Research (NCAR) operates the Derecho system to support Earth system science and related research by researchers at U.S. institutions.

Derecho-GPU

Open Science Grid

Open Science Grid (OSG)

Description

A virtual HTCondor pool made up of resources from the Open Science Grid

Open Science Grid (OSG)

Open Storage Network

Open Storage Network (OSN)

Description

The Open Storage Network (OSN) is an NSF-funded cloud storage resource, geographically distributed among storage pods. OSN is a collaboration between MGHPCC, SDSC, NCSA, Rice, JHU, and RENCI, with a federation of pod-owning sites and contributions from other advanced computing centers. Each OSN pod currently hosts 1.5 PB or more of storage, and is connected to R&E networks between 40 and 100Gbit. OSN storage is allocated in buckets and is accessible using S3 interfaces, including with tools such as Rclone, Cyberduck, and the AWS CLI, or via REST API interfaces.

Open Storage Network (OSN)

OSG Consortium

OSG Open Science Pool

Description

A virtual HTCondor pool made up of resources from the OSG Consortium

OSG Open Science Pool

Pittsburgh Supercomputing Center

PSC Anton 2 Special-Purpose Supercomputer for Molecular Dynamics

Description

Anton is a special purpose supercomputer for biomolecular simulation designed and constructed by D. E. Shaw Research (DESRES). PSC's current system is known as Anton 2 and is a successor to the original Anton 1 machine hosted here.

Anton 2, the next-generation Anton supercomputer, is a 128 node system, made available without cost by DESRES for non-commercial research use by US universities and other not-for-profit institutions, and is hosted by PSC with support from the NIH National Institute of General Medical Sciences. It replaced the original Anton 1 system in the fall of 2016.

Anton was designed to dramatically increase the speed of molecular dynamics (MD) simulations compared with the previous state of the art, allowing biomedical researchers to understand the motions and interactions of proteins and other biologically important molecules over much longer time periods than was previously accessible to computational study. The MD research community is using the Anton 2 machine at PSC to investigate important biological phenomena that due to their intrinsically long time scales have been outside the reach of even the most powerful general-purpose scientific computers. Application areas include biomolecular energy transformation, ion channel selectivity and gating, drug interactions with proteins and nucleic acids, protein folding and protein-membrane signaling.

PSC Anton 2 Special-Purpose Supercomputer for Molecular Dynamics

PSC Brain Image Library (BIL)

Description

The Brain Image Library (BIL) is a national public resource enabling researchers to deposit, analyze, mine, share and interact with large brain image datasets. BIL encompasses the deposition of datasets, the integration of datasets into a searchable web-accessible system, the redistribution of datasets, and a computational enclave to allow researchers to process datasets in-place and share restricted and pre-release datasets. The BIL is operated as a partnership between the Biomedical Applications Group at the Pittsburgh Supercomputing Center and the Center for Biological Imaging at the University of Pittsburgh.

PSC Brain Image Library (BIL)

PSC Bridges-2 Extreme Memory (Bridges-2 EM)

Description

Bridges-2 Extreme Memory (EM) nodes enable memory-intensive genome sequence assembly, graph analytics, in-memory databases, statistics, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Bridges-2 Extreme Memory (EM) nodes each consist of 4 Intel Xeon Platinum 8260M “Cascade Lake” CPUs, 4TB of DDR4-2933 RAM, 7.68TB NVMe SSD. They are connected to Bridges-2's other compute nodes and its Ocean parallel filesystem and archive by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to read or write data from each EM node.

PSC Bridges-2 Extreme Memory (Bridges-2 EM)

PSC Bridges-2 GPU (Bridges-2 GPU)

Description

Bridges-2 combines high-performance computing (HPC), high performance artificial intelligence (HPAI), and large-scale data management to support simulation and modeling, data analytics, community data, and complex workflows. Bridges-2 Accelerated GPU (GPU) nodes are optimized for scalable artificial intelligence (AI; deep learning). They are also available for accelerated simulation and modeling applications. Bridges-2 has four types of GPU nodes: 10 HPE Cray 670 h100-80 nodes, with eight H100-SXM5-80GB GPUs each with 80GB of GPU memory and a total of 2TB RAM per node; 24 HPE v100-32 nodes with eight V100 GPUs with NVLink, each with 32GB of GPU memory and a total of 512GB RAM per node;9 v100-16 nodes containing eight V100 GPUs without NVLink, each with 16GB of GPU memory and a total of 192GB RAM per node; and 3 HPE l40s-48 nodes with 8 L40S GPUs without NVLink, each with 48GB of GPU Memory and a total of 1TB RAM per node.

The nodes are connected to Bridges-2's other compute nodes and its Ocean parallel filesystem and archive by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to enhance scalability of deep learning training.

PSC Bridges-2 GPU (Bridges-2 GPU)

PSC Bridges-2 GPU-AI (Bridges-2 GPU Artificial Intelligence)

Description

Bridges-2 Accelerated GPU (GPU) nodes are optimized for scalable artificial intelligence (AI; deep learning). They are also available for accelerated simulation and modeling applications.

Each Bridges-2 GPU node contains 8 NVIDIA Tesla V100-32GB SXM2 GPUs, for aggregate performance of 1Pf/s mixed-precision tensor, 62.4Tf/s fp64, and 125Tf/s fp32, combined with 256GB of HBM2 memory per node to support training large models and big data.

Each NVIDIA Tesla V100-32GB SXM2 has 640 tensor cores that are specifically designed to accelerate deep learning, with peak performance of over 125Tf/s for mixed-precision tensor operations. In addition, 5,120 CUDA cores support broad GPU functionality, with peak floating-point performance of 7.8Tf/s fp64 and 15.7Tf/s fp32. 32GB of HBM2 (high-bandwidth memory) delivers 900 GB/s of memory bandwidth to each GPU. NVLink 2.0 interconnects the GPUs at 50GB/s per link, or 300GB/s per GPU.

Each Bridges-2 GPU node provides a total of 40,960 CUDA cores and 5,120 tensor cores per node. In addition, each node holds 2 Intel Xeon Gold 6248 CPUs; 512GB of DDR4-2933 RAM; and 7.68TB NVMe SSD.

PSC Bridges-2 GPU-AI (Bridges-2 GPU Artificial Intelligence)

PSC Bridges-2 Regular Memory (Bridges-2 RM)

Description

Bridges-2 Regular Memory (RM) nodes provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. Each Bridges RM node consists of two AMD EPYC “Rome” 7742 64-core CPUs, 256-512GB of RAM, and 3.84TB NVMe SSD. 488 Bridges-2 RM nodes have 256GB RAM, and 16 have 512GB RAM for more memory-intensive applications (see also Bridges-2 Extreme Memory nodes, each of which has 4TB of RAM). Bridges-2 RM nodes are connected to other Bridges-2 compute nodes and its Ocean parallel filesystem and archive by HDR-200 InfiniBand.

PSC Bridges-2 Regular Memory (Bridges-2 RM)

PSC Bridges-2 Storage (Bridges-2 Ocean)

Description

The Bridges-2 Ocean data management system provides a unified, high-performance filesystem for active project data, archive, and resilience. Ocean consists of two tiers, disk and tape, transparently managed by HPE DMF as a single, highly usable namespace.

Ocean's disk subsystem, for active project data, is a high-performance, internally resilient Lustre parallel filesystem with 15PB of usable capacity, configured to deliver up to 129GB/s and 142GB/s of read and write bandwidth, respectively.

Ocean's tape subsystem, for archive and additional resilience, is a high-performance tape library with 7.2PB of uncompressed capacity (estimated 8.6PB compressed, with compression done transparently in hardware with no performance overhead), configured to deliver 50TB/hour.

PSC Bridges-2 Storage (Bridges-2 Ocean)

PSC Neocortex

Description

Neocortex is a highly innovative advanced computing system ideal for foundation and large language models. Neocortex, which captures promising specialized innovative hardware technologies, is designed to vastly accelerate large deep learning (DL) models and high- performance computing (HPC) research in pursuit of science, discovery, and societal good. Neocortex features two Cerebras CS-2 systems, provisioned by an HPE Superdome Flex HPC server and the Bridges-2 filesystems. Each CS-2 system features a Cerebras WSE-2 (Wafer Scale Engine 2), the largest chip ever built, with 850,000 Sparse Linear Algebra Compute cores, 40 GB SRAM on-chip memory, 20 PB/s aggregate memory bandwidth and 220 Pb/s interconnect bandwidth. The HPE Superdome Flex (SDF) features 32 Intel Xeon Platinum 8280L CPUs with 28 cores (56 threads) each, 2.70-4.0 GHz, 38.5 MB cache, 24 TiB RAM, aggregate memory bandwidth of 4.5 TB/s, and 204.6 TB aggregate local storage capacity with 150 GB/s read bandwidth. The SDF can provide 1.2 Tb/s to each CS-2 system and 1.6 Tb/s from the Bridges-2 filesystems. Jobs are submitted via SLURM. The CS-2 systems can run customized TensorFlow and Pytorch containers, as well as programs written using the Cerebras SDK or the WSE Field Equation API.

PSC Neocortex

PSC Neocortex CS-2

Description

PSC Neocortex CS-2

PSC Neocortex Superdome Flex Server

Description

Neocortex is a highly innovative resource that targets the acceleration of AI-powered scientific discovery by vastly shortening the time required for deep learning training, fostering greater integration of artificial deep learning with scientific workflows, and providing revolutionary new hardware for the development of more efficient algorithms for artificial intelligence and high performance computing.

The HPE Superdome Flex (SDFlex) features 32 Intel Xeon Platinum 8280L CPUs with 28 cores (56 threads) each, 2.70-4.0 GHz, 38.5 MB cache, 24 TiB RAM, aggregate memory bandwidth of 4.5 TB/s, and 204.6 TB aggregate local storage capacity with 150 GB/s read bandwidth. The SDF can provide 1.2 Tb/s to each CS-2 system and 1.6 Tb/s from the Bridges-2 filesystems.

SDF Service units are calculated as chassis hours. Each chassis has 112 cpu cores, so an SDFlex SU = 112 core hours.

PSC Neocortex Superdome Flex Server

Purdue University

Purdue Anvil AI

Description

The Purdue Anvil AI system has 21 nodes each with four NVIDIA 80GB H100 SXM GPUs to support machine learning and artificial intelligence applications.

Purdue Anvil AI

Purdue Anvil CPU

Description

Purdue's Anvil cluster built in partnership with Dell and AMD consists of 1,000 nodes with two 64-core AMD EPYC "Milan" processors each and delivers over 1 billion CPU core hours each year, with a peak performance of 5.1 petaflops. Each of these nodes has 256GB of DDR4-3200 memory. A separate set of 32 large memory nodes has 1TB of DDR4-3200 memory each. Anvil's nodes are interconnected with 100 Gbps Mellanox HDR100 InfiniBand.

Purdue Anvil CPU

Purdue Anvil GPU

Description

16 nodes each with four NVIDIA A100 Tensor Core GPUs providing 1.5 PF of single-precision performance to support machine learning and artificial intelligence applications.

Purdue Anvil GPU

San Diego Supercomputer Center

Prototype National Research Platform

Description

The Prototype National Research Platform (PNRP) is a Category II NSF-funded system integrated into the Nautilus cluster operated jointly by the San Diego Supercomputer Center at UC San Diego, the Massachusetts Green High Performance Computing Center (MGHPCC) and the University of Nebraska–Lincoln (UNL). The system features a novel, extremely low-latency fabric from GigaIO that allows dynamic composition of hardware, including FPGAs, GPUs, and NVMe storage. Each of the three sites (SDSC, UNL, and MGHPCC) includes ~1 PB of usable disk space. The three storage systems function as data origins of the CDN, providing data access anywhere in the country within a round-trip delay of ~10ms via use of network caches at three sites and five Internet2 network colocation facilities. The Nautilus cluster serves the broader National Research Platform (NRP) which is a community-owned research and education platform connecting researchers and educators to foster collaboration, accelerate innovation, and share resources. The PNRP contribution to the cluster comprises of 1) a HPC subsystem at SDSC with 8 HGX A100 servers with 8 80G A100 GPUs, 512G memory, and 1TB NVMe storage per server; 32 Alveo U55C FPGAs available composed nodes via GigaIO fabric; 122TB of FabreX connected NVMe; 2) two FP32 subsystems, one each at UNL and MGHPCC, each with 18 GPU nodes with 8 A10 GPUs, 512G memory, and 8TB NVMe per node; and 3) 8 distributed data caches of 50TB each. In addition, the distributed Kubernetes cluster architecture enables other institutions to incorporate their own resources into the cluster.

Prototype National Research Platform

SDSC Expanse CPU

Description

Expanse is a Dell integrated compute cluster, with AMD Rome processors, 128 cores per node, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. The compute node section of Expanse has a peak performance of 3.373 PF. Full bisection bandwidth is available at rack level (56 compute nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and 3:1 oversubscription cross-rack. Compute nodes feature 1TB of NVMe storage and 256GB of DRAM per node. The system also features 12PB of Lustre based performance storage (140GB/s aggregate), and 7PB of Ceph based object storage.

SDSC Expanse CPU

SDSC Expanse GPU

Description

Expanse is a Dell integrated compute cluster, with AMD Rome processors, NVIDIA V100 GPUs, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. The GPU component of Expanse features 52 GPU nodes, each containing four NVIDIA V100s (32 GB SMX2), connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. They feature 1.6TB of NVMe storage and 256GB of DRAM per node. There is HDR100 connectivity to each node. The system also features 12PB of Lustre based performance storage (140GB/s aggregate), and 7PB of Ceph based object storage.

SDSC Expanse GPU

SDSC Expanse Projects Storage

Description

5PB of storage on a Lustre based filesystem.

SDSC Expanse Projects Storage

SDSC Voyager Habana Training and Inference Processor based AI System

Description

Voyager is a heterogeneous system designed to support complex deep learning AI workflows. The system features 42 Intel Habana Gaudi training nodes, each with 8 training processors (336 in total). Each training node has 512GB of memory and 6.4TB of node local NVMe storage. The Gaudi training processors feature specialized hardware units for AI, HBM2, and on-chip high-speed Ethernet. The on-chip ethernet ports are used in a non-blocking all-to-all network between processors on a node and the remaining ports are aggregated into 6 400G connections on each node that are plugged into a 400G Arista switch to provide scale out of network. Voyager also has two first-generation inference nodes, each with 8 inference processors (16 in total). In addition to the custom AI hardware, the system also has 36 Intel x86 processors compute nodes for general purpose computing and data processing. Voyager features 3PB of storage currently deployed as a Ceph filesystem.

SDSC Voyager Habana Training and Inference Processor based AI System

Description

SDSC Voyager Habana Training and Inference Processor based AI System

Texas A&M University

TAMU ACES

Description

ACES is a Dell cluster with a rich accelerator testbed consisting of Intel Max GPUs (Graphics Processing Units), Intel FPGAs (Field Programmable Gate Arrays), NVIDIA H100 and A30 GPUs, NEC Vector Engines, NextSilicon co-processors, and Graphcore IPUs (Intelligence Processing Units). Researchers must be based in the US and associated with a US academic research institution.

The ACES cluster consists of compute nodes using a mix of the following processors:
Intel Xeon 8468 Sapphire Rapids processors
Intel Xeon Ice Lake 8352Y processors
Intel Xeon Cascade Lake 8268 processors
AMD Epyc Rome 7742 processors

The compute nodes are interconnected with NVIDIA NDR200 connections for MPI and access to the Lustre storage. The Intel Optane SSDs and all accelerators (except the Graphcore IPUs and NEC Vector Engines) are accessed using Liqid's composable infrustructre via PCIe (Peripheral Component Interconnect express) Gen4 and Gen5 fabrics.

TAMU ACES

TAMU FASTER

Description

FASTER (Fostering Accelerated Scientific Transformations, Education and Research) is funded by the NSF MRI program (Award #2019129) and provides a composable high-performance data-analysis and computing instrument. The FASTER system has 180 compute nodes with 2 Intel 32-core Ice Lake processors and 256 GB RAM, and includes 240 NVIDIA GPUs (40 A100 and 200 T4 GPUs). Using LIQID’s composable technology, all 180 compute nodes have access to the pool of available GPUs, dramatically improving workflow scalability. FASTER will have HDR InfiniBand interconnection and access/share a 5PB usable high-performance storage system running Lustre filesystem. Thirty percent of FASTER’s computing resources are allocated to researchers nationwide through ACCESS. Researchers must be based in the US and associated with a US academic research institution.

TAMU FASTER

TAMU Launch

Description

Launch is a regional computational resource that supports researchers incorporating computational and data-enabled approaches in their scientific workflows at The Texas A&M University System Schools. A portion is offered to the national community. Researchers must be based in the US and associated with a US academic research institution.

Launch is a Dell Linux cluster with 45 compute nodes (8,640 cores) and 2 login nodes. There are 35 compute nodes with 384 GB memory and 10 GPU compute nodes with 768 GB memory and two NVIDIA A30s. The interconnecting fabric uses a single NVIDIA HDR100 InfiniBand switch.

TAMU Launch

Texas Advanced Computing Center

TACC Dell/Intel Knights Landing, Skylake System (Stampede2)

Description

The Stampede2 Dell/Intel Knights Landing (KNL), Skylake (SKX) System provides the user community access to two Intel Xeon compute technologies.

The system is configured with 4204 Dell KNL compute nodes, each with a stand-alone Intel Xeon Phi Knights Landing bootable processor. Each KNL node includes 68 cores, 16GB MCDRAM, 96GB DDR-4 memory and a 200GB SSD drive.

Stampede2 also includes 1736 Intel Xeon Skylake (SKX) nodes and additional management nodes. Each SKX includes 48 cores, 192GB DDR-4 memory, and a 200GB SSD.

Allocations awarded on Stampede2 may be used on either or both of the node types.

Compute nodes have access to dedicated Lustre Parallel file systems totaling 28PB raw, provided by Cray. An Intel Omni-Path Architecture switch fabric connects the nodes and storage through a fat-tree topology with a point to point bandwidth of 100 Gb/s (unidirectional speed). 16 additional login and management servers complete the system. Stampede2 will deliver an estimated 18PF of peak performance.

Please see the Stampede2 User Guide for detailed information on the system and how to most effectively use it.

https://portal.xsede.org/tacc-stampede2

TACC Dell/Intel Knights Landing, Skylake System (Stampede2)

TACC Dell/Intel Sapphire Rapids, Ice Lake, Skylake (Stampede3)

Description

Stampede3 is generously funded through the National Science Foundation and is designed to serve today's researchers as well as support the research community on an evolutionary path toward many-core processors and accelerated technologies. Stampede 3 maintains the familiar programming model for all of today's users, and thus will be broadly useful for traditional simulation users, users performing data intensive computations, and emerging classes of new users.

TACC Dell/Intel Sapphire Rapids, Ice Lake, Skylake (Stampede3)

TACC Long-term tape Archival Storage (Ranch)

Description

TACC's High Performance Computing (HPC) systems are used primarily for scientific computing and while their disk systems are large, they are unable to store the long-term final data generated on these systems. The Ranch archive system fills this need for high capacity long-term storage, by providing a massive high performance file system and tape-based backing store designed, implemented, and supported specifically for archival purposes. <br/><br/>Ranch is a Quantum StorNext-based system, with a DDN- provided front-end disk system (30PB raw), and a 5000 slot Quantum Scalar i6000 library for its back-end tape archive. <br/><br>Ranch is an allocated resource, meaning that Ranch is available only to users with an allocation on one of TACC's computational resources such as Frontera, Stampede3, or Lonestar6. ACCESS PIs will be prompted automatically for the companion storage allocation as part of the proposal submission process and should include a justification of the storage needs in their proposal. The default allocation on Ranch for users is 2TB. To request a shared Ranch project space for your team's use, please submit a TACC Helpdesk ticket.

TACC Long-term tape Archival Storage (Ranch)

Texas Tech University

Texas Tech REPACSS CPU

Description

The REmotely-managed Power Aware Computing Systems and Services (REPACSS) resource is a high-performance computing (HPC) cluster supported by multiple forms of energy developed to support research into advanced data center control for running scalable scientific workflows and data-intensive research in remotely managed settings. The focus of the project is on improvements to data center and infrastructure control to provide adaptability to emergent conditions and ability to adjust workloads to match data center load conditions including the availability and cost of electrical power. The CPU infrastructure comprises 110 AMD EPYC 9754 compute nodes with access to high-speed cluster-wide storage. Each CPU compute node offers 256 cores and 1.5TB of DDR5 memory, supported by local NVMe swap and temporary storage (1.92TB) to support high-speed checkpoint and restore and local ephemeral usage. The cpu nodes are interconnected with the rest of the cluster and with storage by NVIDIA ConnectX-7 network NDR Infiniband adapters running at 200 Gbps per card with two Infiniband cards per node. The Hammerspace storage provides nearly 3PB of combined NVMe and HDD storage, supporting large-scale data throughput. All nodes are controlled and provisioned through high-bandwidth Dell PowerSwitch S5248-ON and S5232-ON Ethernet switches at 25 Gbps per node. The cluster supports intelligent workload placement and adaptive scheduling tools to align computational activity with the goal to match as much of the workload as possible to low-cost energy availability. REPACSS also features advanced remote management capabilities and automation tools to manage scientific workflows that are specifically targeted to be adopted at scale by other resource facilities and industry.

Texas Tech REPACSS CPU

Texas Tech REPACSS GPU

Description

The REmotely-managed Power Aware Computing Systems and Services (REPACSS) resource is a high-performance computing (HPC) cluster supported by multiple forms of energy developed to support research into advanced data center control for running scalable scientific workflows and data-intensive research in remotely managed settings. The focus of the project is on improvements to data center and infrastructure control to provide adaptability to emergent conditions and ability to adjust workloads to match data center load conditions including the availability and cost of electrical power. The GPU nodes feature dual-socket Intel Xeon Gold 6448Y processors, 512GB RAM, and 4 H100 GPUs connected as two H100-NVL pairs per node. The GPU nodes are interconnected with the rest of the cluster and with storage by NVIDIA ConnectX-7 network NDR Infiniband adapters running at 200 Gbps per card with two Infiniband cards per node. The Hammerspace storage provides nearly 3PB of combined NVMe and HDD storage, supporting large-scale data throughput. All nodes are also controlled and provisioned through high-bandwidth Dell PowerSwitch S5248-ON and S5232-ON Ethernet switches at 25 Gbps per node. The cluster supports intelligent workload placement and adaptive scheduling tools to align computational activity with the goal to match as much of the workload as possible to low-cost energy availability. REPACSS also features advanced remote management capabilities and automation tools to manage scientific workflows that are specifically targeted to be adopted at scale by other resource facilities and industry.

Texas Tech REPACSS GPU

Texas Tech REPACSS Storage

Description

Texas Tech REPACSS Storage

University of Delaware

UD DARWIN Compute Nodes (DARWIN)

Description

Nodes with two AMD EPYC™ 7502 processors (32 cores each) with three memory size options:
48x standard 512 GiB;
32x large-memory 1024 GiB;
11x xlarge-memory 2048 GiB;
1x lg-swap 1024 GiB RAM + 2.73 TiB Intel Optane NVMe swap

UD DARWIN Compute Nodes (DARWIN)

UD DARWIN GPU Nodes (DARWIN GPU)

Description

3 nodes with two Intel® Xeon® Platinum 8260 processors (24 cores each), 768 GiB RAM, and 4 NVIDIA Tesla V100 32GB GPUs connected via NVLINK™
9 nodes with two AMD EPYC™ 7502 processors (32 cores each), 512 GiB RAM, and a single NVIDIA Tesla T4 GPU
1 node with two AMD EPYC™ 7502 processors (32 cores each), 512 GiB RAM, and a single AMD Radeon Instinct MI50 GPU

UD DARWIN GPU Nodes (DARWIN GPU)

UD DARWIN Storage (DARWIN Storage)

Description

DARWIN's Lustre file system is for use with the DARWIN Compute and GPU nodes.

UD DARWIN Storage (DARWIN Storage)

University of Kentucky

Kentucky Research Informatics Cloud (KyRIC) Large Memory Nodes

Description

Five large memory compute nodes dedicated for XSEDE allocation. Each of these nodes have 40 cores (Broadwell class and lntel(R) Xeon(R) CPU E7-4820 v4 @ 2.00GHz with 4 sockets, 10 cores/socket), 3TB RAM, and 6TB SSD storage drives. The 5 dedicated XSEDE nodes will have exclusive access to approximately 300 TB of network attached disk storage. All these compute nodes are interconnected through a 100 Gigabit Ethernet (l00GbE) backbone and the cluster login and data transfer nodes will be connected through a 100Gb uplink to lnternet2 for external connections.

Kentucky Research Informatics Cloud (KyRIC) Large Memory Nodes