Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The cluster, known as Hydra, is made of
    1. two login nodes,
    2. one front-end node,
    3. a queue manager/scheduler (the UNIVA Grid Engine or GEUGE), and
    4. a slew of compute nodes.

...

  • From either login node you submit and monitor your job(s) via the queue manager/scheduler.
  • The queue manager/scheduler is the Grid Engine, simply GE (aka SGE for Sun Grid Engine)or UGE.

  • The Grid Engine runs on the front-end node (hydra-45.si.edu), hence the front-end node should not be used as a login node.
    • There is no reason for users to ever have to log on Hydra-5.
  • All the nodes (login, front-end and compute nodes) are interconnected
    • via Ethernet (at 1Gbps, or 10Gbps)10Gbps, aka 10GbE, and
    • via InfiniBand (at 40Gbps or higher, aka IB).

  • The disks are mounted off a dedicated device (aka appliance, server:  a NetApp filer) connected to all the nodes via a 1Gbps networks switch with two 10Gbps up-links3 types of dedicated devices
    1. The NetApp filer for /home, /data, and /pool (via 10GbE),
    2. A GPFS for /scratch (via IB)
    3. A NAS for /store (via 10GbE), a near-line storage only available on some nodes.

The following figure is a schematic representation of the cluster:

...

The cluster is supported by the following individuals

  • Jamal Uddin DJ Ding (UddinJ@siDingDJ@si.edu), the the system administrator (at OCIO, Herndon, VA). As the sys-admin, he is responsible to keep the cluster operational and secure.
  • Rebecca Dikow (DikowR@si.edu) provides Bioinformatics and Genomics support (Data Science Lab/OCIO, Washington, D.C.). She is the primary support person for Bioinformatics and Genomics at SI.
  • Matthew Kweskin (KweskinM@si.edu) - NMNH/L.A.B., IT specialist (Washington, D.C.).
  • Sylvain Korzennik (hpc@cfa.harvard.edu), an astronomer at SAO (Cambridge, MA) with 20+ years of HPC experience.
    As the SAO's HPC analyst, his primary role is to support SAO scientists.
    He is also responsible for configuring and tuning the cluster, its queuing configuration, general Unix support, validation and documentation. He is the primary support person for astronomers at SAO.

Support is also provided by other OCIO staff members (networking, etc...).

...

  • replies to these messages are by default broadcast to the entire list; and
  • you will need to set up a password on this listserv the first time you use it (look in the upper right, under "Options").

...

Last updated  SGKupdated   SGK.