Submitted by Eric Adams on
News Content

The underlying issue has been isolated and the outage is resolved

__________________________________________________________

 

The Anvil cluster began experiencing issues with Slurm Scheduling this past week. Engineers are currently diagnosing the root cause and are working to identify a fix.

Scheduling is still enabled at this time.

You may experience periodic SLURM outage where command will be unable to connect to the slurm controller.  This can cause jobs to take longer than normal, and in some instanes fail.

In addition, Open Ondemand relies on Slurm to run applications.  When these issues with slurm occur, the menu in OOD may appear empty or non functional.

We will provide another update by 5PM EST today

Infrastructure News Type
Outage Partial
Affected Infrastructure
Distribution Options
Email everyone with access
Start Date
End Date