Texas Tech REPACSS GPU
Resource Type
Compute
Latest Status
pre-production
Description

The REmotely-managed Power Aware Computing Systems and Services (REPACSS) resource is a high-performance computing (HPC) cluster supported by multiple forms of energy developed to support research into advanced data center control for running scalable scientific workflows and data-intensive research in remotely managed settings. The focus of the project is on improvements to data center and infrastructure control to provide adaptability to emergent conditions and ability to adjust workloads to match data center load conditions including the availability and cost of electrical power. The GPU nodes feature dual-socket Intel Xeon Gold 6448Y processors, 512GB RAM, and 4 H100 GPUs connected as two H100-NVL pairs per node. The GPU nodes are interconnected with the rest of the cluster and with storage by NVIDIA ConnectX-7 network NDR Infiniband adapters running at 200 Gbps per card with two Infiniband cards per node. The Hammerspace storage provides nearly 3PB of combined NVMe and HDD storage, supporting large-scale data throughput. All nodes are also controlled and provisioned through high-bandwidth Dell PowerSwitch S5248-ON and S5232-ON Ethernet switches at 25 Gbps per node. The cluster supports intelligent workload placement and adaptive scheduling tools to align computational activity with the goal to match as much of the workload as possible to low-cost energy availability. REPACSS also features advanced remote management capabilities and automation tools to manage scientific workflows that are specifically targeted to be adopted at scale by other resource facilities and industry.

Features
Is an ACCESS Allocated Production Compute resource
GPU use is the main purpose of this resource
Resource supports community software areas for users to share software with other users
Resource offers discounted job queues where running jobs can be preempted
Unique, innovative or non-traditional compute resource
Resource is allocated by ACCESS
An intuitive, innovative, and interactive interface to remote computing resources
Provides Globus data transfer and data sharing services for local storage
preemption
NSF ACSS Category 2 Resources
AI tools and support
Organization Name
Texas Tech University
Global Resource ID
repacss-gpu.ttu.access-ci.org