News Content
We will have maintenance scheduled on Expanse starting 8AM (PT) May 8 through May 9, 2023. During this maintenance several upgrades will be performed. Details of the upgrades and the process are below:
- The operating system will be upgraded from Rocky Linux 8.5 to Rocky Linux 8.7. The OFED version will be updated to 5.8.1.1. In addition, the firmware on all nodes will be updated.
- The drivers on the GPU nodes will be updated to version 515.65.01 (CUDA 11.7).
- The Lustre client version will be updated to version 2.15.2.
- A new software stack build using Spack 0.17.3 will become available and will be the default (available as modules cpu/0.17.3b or gpu/0.17.3b).
- The original software stack will continue to be available and will work with the new OS and OFED versions. The old software stack can be accessed by using "module load cpu/0.15.4" or "module load gpu/0.15.4".
- We will be making the new software environment available in advance of the maintenance and will send out an update once it is available.
- Jobs submitted before the upgrade without specifying the version of cpu (0.15.4) or gpu (0.15.4) must be cancelled and resubmitted after the maintenance to ensure that the right environment is picked up for their jobs (since the defaults will change to 0.17.3b).
- We are reserving nodes for multiple days given the large number of updates involved. Jobs that will not fit in the time window before the maintenance will be pending and held before the upgrade.
- After our service nodes are upgraded and the Lustre update is completed, we will look to put fully upgraded nodes back into the queue as possible to help mitigate the long downtime.
Please submit a ticket via ACCESS Support (https://support.access-ci.org) or SDSC's local ticketing system (consult@sdsc.edu) if you have any questions.
Thanks,
SDSC User Services Team
Infrastructure News Type
Outage Full
Affected Infrastructure
Start Date
End Date