• March 22, 2024 - upgrade to Linux Rocky 8.9 and add 15 new compute nodes.
    • Schedule

      On Monday April 22, 2024 at 9am EDT, all of the running and queued jobs will be killed and access to Hydra's login nodes will be disabled.  During the upgrade period you will have no access to the files stored on Hydra. This also means that the Globus collections on /pool/scratch, and /store will also be unavailable. Please plan accordingly.

    Hydra and your files will be available when the upgrade is completed by Friday May 3, 9am. If the upgrades are completed sooner than anticipated, we will make the system available earlier.

    • Hardware Changes

      We will add 15 new compute nodes (2 nodes with 192 CPUs and 1.5TB of memory, 12 nodes with 128 CPUs and 1.0TB of memory, and 1 node with 4 GPUs - NVIDIA L40S, 48GB). The composition of the cluster will grow to more than 5900 CPUs, over 44TB of aggregate memory, distributed over 78 nodes, with 3 nodes with GPUs (one quad, two dual, or 8 GPUs).

    • Software Changes

      We will update Hydra's OS from CentOS 7.9 to Rocky 8.9 to support the new compute nodes and the latest software offerings.

      Rocky 8 is the successor to CentOS 7. Both Linux distributions are based on Red Hat Enterprise Linux and share many similarities. We will also upgrade various packages to the most recent versions, including the job scheduler (i.e., the Grid Engine), and many of the modules. We will no longer support old versions of some software packages. 

      We will update the documentation on the Wiki and update the "2024 Cluster Upgrade to Hydra-7" page with details on what has changed including new module versions. We plan to offer a workshop or "office hours" to help users to transition.

      As always we are striving to make this transition as smooth as possible, while leveraging the opportunities and challenges of using a new version of the OS. Stay tuned for more details.

November 27, 2023

  • We have completed the FY23 hardware purchases and started work on upgrading the cluster's OS from CentOS 7 to Rocky 8.
  • We will add 15 new compute nodes, including a quad GPU server, and retire the oldest compute nodes.
  • The 2024 Upgrade page details the planned changes and will describe them once completed.
  • We are in the process of testing Rocky 8 and are currently aiming to transition Hydra to this OS in the end of January - beginning of February 2024 time-frame.
    • We anticipate the cluster being shut down for about 10 days for this work during which time no jobs will be running and there will be no access to the files stored on Hydra. 

September 27, 2023

  • Software updates
    • The latest version of Python available from Anaconda, namely version 3.11, has been installed. It can be accessed via the tools/python/3.11 module.
      • The default version, tools/python, remains Anaconda's distribution version 3.8. 
    • The latest version of IDL (8.9.0) has been installed as well as their latest license manager. IDL is accessed via the idl module.
      • The default version remains 8.8.1, you can access the more recent versions via the idl/8.8.2 or idl/8.9.0 modules. Note the oldest versions, i.e.. 8.6, will no longer work after Nov 30, 2023.
    • The latest versions of MATLAB runtime has been installed on Hydra: R2022b, R2023a and R2023b. They are accessible via the matlab/2022b and matlab/2023[ab] modules.
      • The default version remains 2021b.
    • The latest version of Julia has been installed on Hydra, namely version 1.9.3, via the tools/julia/1.9.3 module.
      • The default version remains 1.6.3.
    • The latest NVIDIA compilers, formerly PGI, have been installed: versions 23.5 and 23.7, and are accessed via the nvidia/23.5 and nvidia/23.7 modules.
      • Note that all flavors of MPI for these new versions are not working on Hydra.
      • The default version remains 21.9, the newer available versions are 22.[1239] and 23.[357], with full MPI support up to 23.3.
    •    We are unable to install the latest INTEL compilers, since INTEL only releases new versions for Rocky 8, and no longer for CentOS 7.
      • The latest INTEL compiler versions are 2022.[12], the default version remains 2021.4.
  • Upgrade of Hydra to Rocky 8
    • We will update Hydra's OS from CentOS 7 to Rocky 8, to fully support the latest hardware and software offerings.
      • The new compute nodes we are in the process of purchasing (see below) require Rocky 8 as well as various new software packages. 
    • Note that the transition to Rocky 8 might not be as transparent as previous CentOS upgrades. We will strive to make it as smooth as possible.
    • We do not have yet any estimate of when we will transition to Rocky 8, so stay tuned.
    • As usual, we will give at least four weeks notice for any scheduled downtime.
  • Hardware updates
    • We plan to refresh 14 compute nodes with high end servers fitted with the latest AMD processors (Zen4), with 128 or 192 cores per server and 1 or 1.5TB of memory each (12 & 2 respectively).
    • We also plan on adding one GPU server with four A100 GPUs.
    • We will retire our oldest compute nodes (compute-43-xx). We thank Deron Burba, SI CIO, for contributing additional funding to make these purchases possible.
    • We have recently evaluated various storage options and anticipate expanding our storage on Hydra next year, adding more /scratch space and if possible adding a somewhat smaller but very fast different storage system. 
  • Personnel changes.
    •   After more than 7 years at the Office of Research Computing (ORC), Rebecca Dikow will leave ORC and move on to her next endeavor.  
    •   The ORC is putting in place a transition plan so as to avoid any disruptions this might cause. Please use si-hpc@si.edu or si-hpc-admin@si.edu for communicating with us, instead of emailing Rebecca directly for issues/questions related to Hydra.
  • November 15, 2022

The migration of the 'disks' /home and /share/apps to a hybrid aggregate (a more performant set of disks that combines SSD and HDDs) was completed this past weekend. This will speed up access to the files stored under /home and /share/apps.

      • As a result, the total size of /home is slightly smaller, while the sizes of /data/sao and /data/genomics have been increased.

      • Quotas have not been changed.

      • If /home fills up too quickly we may have to reduce the quota on /home

      • Users who want to use /data need to request access  (up to 2TB of un-scrubbed space); non-SAO users should contact Rebecca, SAO users should contact Sylvain,

Please refer to Disks Space and Usage on the Wiki's Reference pages as to what disks to use and what to store where.

  • November 7, 2022
    • 21 new compute nodes were added to Hydra as compute-65-xx (Dell R6515: AMD EPYC CPUs, 64 cores, 512GB memory).
    • All compute nodes with an AMD EPYC CPU have their "cpu_arch" set to "zen."
    • All the old compute-81-xx nodes (Dell R815) have been retired. 
  • May 9, 2022
    • The latest versions of the NVIDIA and Intel compilers (22.1, 22.2, 22.3 & 2022.1, 2022.2 respectively) have been installed,
    • The latest version of IDL (8.2.2) and MATLAB runtime (R2022a) have been installed,
    • The required modules are available, the default version have not yet been changed,
    • The examples have yet to be expanded to the new versions (the required changes should be obvious).
    • We plan to change the default versions some time in early June.
  • January 6 2022

    • Since we have increased the cluster capacity, the maximum number of CPUs (slots) a user can use concurrently has been increased from 640 to 840.

  • December 17 2021

    • Eight new compute nodes have been added to Hydra, bringing the totals to 5,408 CPUs for 98 nodes, and 42TB of memory.

    • We have noticed a read performance problem on the GPFS and are working with the vendor to resolve it as soon as possible.

  • December 2, 2021
    • Look of the status pages has been update, and URL can now take up to 3 arguments.
    • New hardware (8 servers and 56 GPFS disks) has been delivered and will be deployed soon.
  • November 29, 2021
    • The default version for 7 modules has been updated as  announced in the Nov 22 update (see below.)

  • Nov 22 2021
    • The documentation has been updated and reorganized to reflect the most recent changes. 
    • New versions of the compilers and several tools have been installed,
  • Sep 25 2021
    • IDL 8.8.1 is available, use: module load idl/8.8.1

    • Loading idl/8.8 will still load 8.8.0 for a little while. Also the IDL licensing method has changed, you will now see the message:   

      License: 100554-5516875-BUF
      License expires 30-Nov-2021.

      which is normal (similar to SAO/CfA/CF's installation).

  • Sep 14 2021
    • The next major upgrade of the SI/HPC cluster is completed.

    • We have made every effort to set up the new configuration as backward compatible as possible, although a few things have changed. 

    • Please look at the 2021 Cluster Upgrade page for details.
  • Jun 24 2021
    • The next major upgrade of the SI/HPC cluster, Hydra, will take place from August 30th through September 14th, 2021.

      • During the upgrade, Hydra will be inaccessible to users, and

      • as of 9am EDT on Monday August 30th any running jobs will be killed and any queued jobs will be deleted.

      While we are making every effort to set up the new configuration as backward compatible as possible, there will be changes. We hope to have Hydra back up before September 8th, but that decision won’t be made until the upgrade work is completed. During the downtime, access to files stored on Hydra will be limited, and at times unavailable, although none of your files will be deleted.

    • Please look at the 2021 Cluster Upgrade page for additional details.
  • Sep 15 2020
    • Scrubbing on /scratch has resumed: files older than 180 days are scrubbed
    • The GPFS s/w is in the process of being upgraded from v4 to v5, on a rolling basis and transparent to the users
    • IDL v 8.8[.0] is available
  • Apr 3 2020 - Hydra DOI
    • We have setup a DOI for Hydra (https://doi.org/10.25572/SIHPC)
      You are now able and encouraged to cite Hydra whenever research has benefited from its use and add a DOI link to that citation or acknowledgment.
  • Mar 24 2020 - Hydra status while teleworking
    • Hydra remains up and running.
    • We will address problems that require on-site staff as fast as possible.
    • We will answer people's questions and requests as promptly as possible.
    • Access to Hydra via VPN:
      • users are asked to limit the strain on the institutional VPN resources when ever appropriate.
    • Hydra can be accessed without VPN:
      • use the "Hydra" link under "It Tools" (or use RDP) at telework.si.edu.
      • SAO users can use login.cfa.harvard.edu to ssh to Hydra.
      • Access to the self serve password page is now working.
      • How to use Dropbox or Firefox Send (ffsend) to copy files to/from Hydra is documented on the Wiki.
    • The scrubbing policy has been modified as follows:
      • the scrubber will run on /pool/sao and /pool/genomics as usual, but
      • the scrubbed content will not be deleted for at least 21 days, and
      • we will accept requests to preserve what was scrubbed (beyond 21 days) as long as needed. To get your files restored, follow the usual instructions.
    • Users are asked to remain in contact via their SI email.
  • February 27, 2020 - cpu_arch resource and IDL 8.7.3
    • We have added a new resource, called cpu_arch, to allow users to direct jobs on nodes with CPUs of a specific (list of) architecture(s).
      If you run jobs/codes that can only run on (a) specific type(s) of processors, look at the new section CPU Architecture under the Available Queues page.

    • IDL version 8.7.3 has been installed on Hydra, and is accessible via the idl/8.7.3 module. The idl/8.7 module is now pointing to idl/8.7.3

  • January 14, 2020 - Increased total slot limit
    • The total number of slots (CPUs) a user can grab has been increased from 512 to 640.

Last updated  SGK


  • No labels