The Cosmos supercomputer is built on the HPE Cray Supercomputing EX2500 platform, incorporating innovative AMD Instinct™ MI300A accelerated processing units (APUs), HPE Slingshot interconnect and a flash-based filesystem. The APU uniquely features an in-chip memory layout, which is integrated and shared between CPU and GPU resources. This type of memory architecture facilitates an incremental programming approach, which enables many communities to adopt GPUs and ease the process of porting and optimizing a range of applications. Cosmos has 42 nodes, each with 4 APUs in a fully connected network based on AMDs Infinity xGMI (socket-to-socket global memory interface) technology, which provides 768 GBps aggregate and 256 GBps peer-to-peer bi-directional bandwidth between APUs. Nodes are interconnect with a high-performance interconnect based on HPE’s Slingshot technology, which provides low latency and congestion control. The high-performance VAST filesystem (551TB usable) incorporates flash-based storage and provides the high IOPS, and bandwidth needed for the anticipated mixed-application workload. Cosmos also has access to 4.9 PB of Ceph capacity storage to provide excellent I/O performance for most applications and to store persistent project data.
SDSC Cosmos Innovative Cluster featuring AMD MI300A APUs
Resource Type
Compute
Latest Status
pre-production
Description
User Guide URL
Features
General compute use
GPU use is the main purpose of this resource
Unique, innovative or non-traditional compute resource
Resource is allocated by the Resource Provider
NSF ACSS Category 2 Resources
AI tools and support
Traditional high-performance computing cluster for running batch jobs
Organization Name
San Diego Supercomputer Center
Global Resource ID
cosmos.sdsc.access-ci.org