The Hydra cluster will be upgraded as follows:
- Software
- Upgrade OS to CentOS 7.6,
- Change our management tool (from Rocks 6.x to Bright Cluster, aka BCM 8.2),
- Switch from SGE to UGE (Univa) version 8.6.6, a commercially supported version of SGE (backward compatible).
- Hardware
- Add 16 new nodes (40c/380GB) for a total of 640 cores,
- Decommission oldest nodes (old compute-6* and compute-4-*),
- Add 1.5PB of high performance storage (parallel file system GPFS, aka Spectrum Scale),
- This is in addition to the 500TB of NetApp and the 950TB of near-line storage (NAS),
- The
/scratch
disk will be moved to the GPFS and increase in size from 100TB to 900TB (450TB+450TB for/scratch/genomics
and/scratch/sao
) - Move all the nodes to a 10Gb/s Ethernet connection (was 1Gb/s).
Upgrade schedule is 8/26 to 9/2/2019
Please Note
- While we are making every effort to set up the new configuration as backward compatible as possible, there will be changes.
- The upgrade will take place from August 26th through September 2nd, 2019.
- During this time, Hydra will be inaccessible to users and as of 9am on Monday August 26th any remaining running jobs will be killed.
- We hope to have Hydra back up before September 2nd, but that decision won’t be made until the upgrade work is completed.
- During the down time, access to files stored on Hydra will be limited, and at times unavailable. Note that none of your files will be deleted.
Plan your use of Hydra accordingly.
Please read this page carefully, and after the upgrade, do not hesitate to contact us (at SI-HPC-Admin@si.edu) if something is not working as it should any longer.
- For biology and genomics software issues and/or incompatibilities that arise, please contact SI-HPC@si.edu,
- SAO users please contact Sylvain at hpc@cfa.harvard.edu.
List of Software Changes
- Implementation of user account management using LDAP
- The procedure to change your password will change:
- You no longer use
passwd
on the login node and head node, instead - Use the Self Service Password (SSP) page, listed at https://hydra-5.si.edu
- You no longer use
- Email SI-HPC-Admin@si.edu only if this fails.
- The procedure to change your password will change:
- Switching from Rocks to BCM
- BCM locates things differently from Rocks;
- for example
/opt
is no longer used, intstead BCM uses/cm
- for example
- If you use modules, everything should work the same;
- but if you hardwired locations, things may no longer work the same;
- The compute nodes will be renamed as follows:
compute-NN-MM,
- i.e. the "
N
", or logical rack number, and - M the node index are both two-digit numbers;
- You should use modules as much as possible.
- BCM locates things differently from Rocks;
- Job Scheduler
- We are switching to UGE, or Univa Grid Engine; that
- is backward compatible with SGE;
- offers some additional features;
- Note that the output of some commands will look different, and
some have different options
- We are switching to UGE, or Univa Grid Engine; that
- The list of queues and their limits will not changed,
- a few complex values will change (to use local SSD and GPU usage)
One important change is that
the memory reservation value will no longer be specified per slots, but per jobs:the specification "-pe mthread 10 -l mres=20G,h_data=20G,h_vmem=20G,himem" must be changed to "-pe mthread 10 -l mres=200G,h_data=200G,h_vmem=200G,himem" to reserve 200GB of memory for this job
- Compilers
- The default compiler versions will change, and
- they had to be reloaded.
- For MPI jobs the compiler/flavor/version combos will change.
- Local tools
- The tools accessible with the
tools/local
module have been split into two groups and - some of the names have been changed or simplified
- Use
module help tools/local
andmodule help tools/local+
to see what the split is and what the names are. - Use
module help tools/local-bc
to see the correspondence - Loading
tools/local-bc
will- load
tools/loca
l andtools/local
+ and - create aliases to be backward compatible (-bc)
- load
- use the new names whenever possible
- The tools accessible with the
Last updated SGK.