2019 Cluster Upgrade

The Hydra cluster will be upgraded as follows:

Software
- Upgrade OS to CentOS 7.6,
- Change our management tool (from Rocks 6.x to Bright Cluster, aka BCM 8.2),
- Switch from SGE to UGE (Univa) version 8.6.6, a commercially supported version of SGE (backward compatible).

Hardware
- Add 16 new nodes (40c/380GB) for a total of 640 cores,
- Decommission oldest nodes (old compute-6* and compute-4-*),
- Add 1.5PB of high performance storage (parallel file system GPFS, aka Spectrum Scale),
- This is in addition to the 500TB of NetApp and the 950TB of near-line storage (NAS),
- The /scratch disk will be moved to the GPFS and increase in size from 100TB to 900TB (450TB+450TB for /scratch/genomics and /scratch/sao)
- Move all the nodes to a 10Gb/s Ethernet connection (was 1Gb/s).

Upgrade schedule is 8/26 to 9/2/2019

Please Note

While we are making every effort to set up the new configuration as backward compatible as possible, there will be changes.
The upgrade will take place from August 26th through September 2nd, 2019.
During this time, Hydra will be inaccessible to users and as of 9am on Monday August 26th any remaining running jobs will be killed.
We hope to have Hydra back up before September 2nd, but that decision won’t be made until the upgrade work is completed.
During the down time, access to files stored on Hydra will be limited, and at times unavailable. Note that none of your files will be deleted.

Plan your use of Hydra accordingly.

Please read this page carefully, and after the upgrade, do not hesitate to contact us (at SI-HPC-Admin@si.edu) if something is not working as it should any longer.

For biology and genomics software issues and/or incompatibilities that arise, please contact SI-HPC@si.edu,
SAO users please contact Sylvain at hpc@cfa.harvard.edu.

List of Software Changes

Implementation of user account management using LDAP
- The procedure to change your password will change:
  - You no longer use passwd on the login node and head node, instead
  - Use the Self Service Password (SSP) page, listed at https://hydra-5.si.edu
- Email SI-HPC-Admin@si.edu only if this fails.

Switching from Rocks to BCM
- BCM locates things differently from Rocks;
  - for example /opt is no longer used, intstead BCM uses /cm
- If you use modules, everything should work the same;
  - but if you hardwired locations, things may no longer work the same;
- The compute nodes will be renamed as follows:
  - compute-NN-MM,
  - i.e. the "N", or logical rack number, and
  - M the node index are both two-digit numbers;
- You should use modules as much as possible.

Job Scheduler
- We are switching to UGE, or Univa Grid Engine; that
  - is backward compatible with SGE;
  - offers some additional features;
  - Note that the output of some commands will look different, and
  - some have different options

The list of queues and their limits will not changed,
- a few complex values will change (to use local SSD and GPU usage)
- One important change is that
  the memory reservation value will no longer be specified per slots, but per jobs:
```
the specification  "-pe mthread 10 -l mres=20G,h_data=20G,h_vmem=20G,himem"
must be changed to "-pe mthread 10 -l mres=200G,h_data=200G,h_vmem=200G,himem"
to reserve 200GB of memory for this job
```

Compilers
- The default compiler versions will change, and
- they had to be reloaded.
- For MPI jobs the compiler/flavor/version combos will change.

Local tools
- The tools accessible with the tools/local module have been split into two groups and
- some of the names have been changed or simplified
- Use module help tools/local and module help tools/local+ to see what the split is and what the names are.
- Use module help tools/local-bc to see the correspondence
- Loading tools/local-bc will
  - load tools/local and tools/local+ and
  - create aliases to be backward compatible (-bc)
- use the new names whenever possible

Last updated 23 Aug 2019 SGK.

Page tree

2019 Cluster Upgrade

The Hydra cluster will be upgraded as follows:

Upgrade schedule is 8/26 to 9/2/2019

List of Software Changes