Voyager is a heterogeneous system designed to support complex deep learning AI workflows. The system features 42 Intel Habana Gaudi training nodes, each with 8 training processors (336 in total). Each training node has 512GB of memory and 6.4TB of node local NVMe storage. The Gaudi training processors feature specialized hardware units for AI, HBM2, and on-chip high-speed Ethernet. The on-chip ethernet ports are used in a non-blocking all-to-all network between processors on a node and the remaining ports are aggregated into 6 400G connections on each node that are plugged into a 400G Arista switch to provide scale out of network. Voyager also has two first-generation inference nodes, each with 8 inference processors (16 in total). In addition to the custom AI hardware, the system also has 36 Intel x86 processors compute nodes for general purpose computing and data processing. Voyager features 3PB of storage currently deployed as a Ceph filesystem.
SDSC Voyager Habana Training and Inference Processor based AI System
Resource Type
Compute
Latest Status
production
Description
User Guide URL
Features
Unique, innovative or non-traditional compute resource
Resource is allocated by the Resource Provider
Agency supercomputers and advanced architecture systems
NSF ACSS Category 2 Resources
Organization Name
San Diego Supercomputer Center
Global Resource ID
voyager.sdsc.access-ci.org