Submitted by Mahidhar Tatineni on
News Content

Dear Expanse User,

The Expanse Lustre filesystem issues have been resolved and the filesystem is back in production use. Thank you for your patience through the long outage. We will continue to monitor the filesystem and follow up as needed. 

Users are also reminded that both /expanse/lustre/scratch and /expanse/lustre/projects are not backed up. So please make offsite copies of anything critical in those locations. 

Thanks

SDSC User Services Staff

------------

Dear Expanse User,

We are continuing to work on the Expanse Lustre filesystem. The initial problem was due to a hardware issue with one of the metadata server drives. The drive was replaced but problems persist with mounting of storage targets due to a software bug being triggered in the Lustre filesystem. Unfortunately since the metadata server controls the entire filesystem, the /expanse/lustre/scratch and /expanse/lustre/projects directories will continue to be unavailable. We are sorry for the impact this is causing and will keep users posted about any new developments.

Thanks

SDSC User Services Staff

---------

Dear Expanse User,

We are continuing to work on the metadata server problem on the Expanse Lustre filesystem. Unfortunately the outage is going to go longer and we will update once we have more information.

Thanks 

SDSC User Services Staff

-----------------------------------------

Dear Expanse User,

We are continuing to work on the Lustre filesystem on Expanse. The problem is going to take much longer than anticipated to resolve and likely the earliest we can recover is tomorrow (03/18/2025). We recognize that a lot of Expanse users do not use the lustre directories and to enable them to run we will release the reservation. We have held current jobs that are clearly using lustre (e.g. if they specified it in the output path or working directory path). However, if you have jobs that use Lustre without specifying the need through a constraint, the jobs will fail. We strongly recommend all jobs needing Lustre include the following line: 

#SBATCH --constraint="lustre" 

Please see more details in our user guide under the "SUBMITTING JOBS USING LUSTRE" subsection in the storage section (https://www.sdsc.edu/systems/expanse/user_guide.html#narrow-wysiwyg-10). We also want to remind users that the home and NFS directories are limited in performance and scaling so please don't submit intensive jobs that need the Lustre filesystem from there. 
 

We will keep you posted on the Lustre issue status tomorrow.

 

Thanks 

SDSC User Services Staff

----------------------------------------

 

 

Infrastructure News Type
Outage Partial
Distribution Options
Email only subscribers
Start Date
End Date