ICTS has several data centers across campus and has a maintenance policy for each center. Additionally the HPC team has maintenance tasks that need to be carried out on the firmware, operating system and application software of the HPC nodes.
Most maintenance slots do not require a full shut down of the data centers and your submitted jobs should continue running, however access to the cluster may be limited. If we are aware of a full data center shutdown it will be highlighted on this page, our blog and the dashboards. We will also attempt to contact our users well in advance.
Up-coming maintenance tasks for eResearch HPC:
- TBA – Mellanox Infiniband cards firmware patch.
- TBA – Mellanox Infiniband switch firmware patch.
Recently completed maintenance tasks for eResearch HPC:
- 13 March 2017 – BeeGFS servers and client update. Mellanox Infiniband switch firmware patch was not applied.
- 20 February 2017 – Patch and reboot Hex head node.
- 12 February 2017 – BDC power shutdown.
- 18 January 2017 – Moved 8 HPC nodes to alternate rack.
- 21 November 2016 – Operating system, GPU card and Infiniband card patch.
- 7 July 2016 – High memory nodes moved to a new rack.
- 26 June 2016 – Power maintenance in both data centers.
- 23 June 2016 – Replaced faulty hard drive in srvcnthpc407.
- 13 June 2016 – HAL SLURM scheduler upgraded from 15.08 to 16.05
- 6 June 2016 – OS patches on HAL and GRID cluster.