Page tree
Skip to end of metadata
Go to start of metadata
  1. Introduction
  2. The Command module
    1. Private module files
    2. Available Module Files
  3. Compilers
  4. How to build and run MPI programs
  5. How to build and run multi-threaded programs
  6. Libraries
  7. Packages/Tools
  8. BioInformatics/Genomics Software

1. Introduction

Hydra is a Linux cluster running CentOS 7.6, while the software installation is managed using Bright Cluster management (BCM or CM).

Like any Linux machine, your Un*x environment on Hydra can be configured to your liking.

How to configure a Un*x environment is beyond the scope of this set of documentation.


A set of configuration files, located in your home directory, sets up your Un*x environment - a default set of such files is provided when a new account is created:

Login shell
Bash shell (bash)Bourne shell (sh)C-shell (csh or tcsh)Action
.bash_profile.profile.cshrcread & executed at startup to configure your environment
.login.login.loginread & executed at startup, next, but only by a login shell
.emacs.emacs.emacsconfigures the emacs editor
.bash_logout.logout.logout

read & executed when logging out (login shell)

Note

qsub'ed job scripts are not started as a login shell, hence unless you fully understand the idiosyncrasies of the bash shell startup rules,

it is recommended that you use the Bourne shell (sh) or the C-shell (csh), and not the bash shell (bash) when submitting jobs.


The command module is now available on Hydra:

  • Instead of having to edit your .bash_profile file (or .profile or .cshrc) to configure your PATH (and MANPATH and LD_LIBRARY_PATH, etc), you should use the command module (as explained below).


Email forwarding:

  • Email sent on the cluster is delivered to the head node (that you should not use),
  • To access these emails (like job notifications),  a ~/.forward file  was created with your "canonical/home" email address, i.e.:

      % echo DoeJ@si.edu > ~/.forward

  • While you are welcome to edit this file and change the forwarding email:
    • do not delete it, and
    • use an email address you will read.
  • Note that by SD931, we must communicate with users via their work email, not their private one
  • You "on file" email (used for things like password reset and HPCC-L listserv) will remain you "canonical/home" email (the one that ends in .edu )


The plan/project files:

  • the content of the files ~/.plan and ~/.project are displayed by the command finger (man finger);
  • feel free to put relevant/pertinent information in them
  • try 
      % finger hpc

2. The Command module

(lightbulb) Instead of having to edit your ~/.bash_profile file (or ~/.profile or ~/.cshrc) to configure your PATH (and MANPATH and LD_LIBRARY_PATH, etc), use the command module.

  • For example, to use the GNU compiler version 4.9.2, one simply needs to execute
    % module load gcc/4.9.2
  • The command module uses module files, files that lists what is needed to be done to your Un*x environment to use a given tool, including the location of the tool.
  • A module can be loaded, unloaded, or you can switch to a different version without having to edit a file or source a configuration file.
  • Also, if the location of a tool changes, one only has to edit the corresponding module file.
    • The change is transparent to the users and to any script or job that uses that tool via the command module.
  • There is no need to type long paths that include version numbers.
  • Module files also allow to check for conflict: you can't try to use different versions of the same tool simultaneously.
  • The command module works the same way whether you use use the bash or the csh shell,
    • so there is no need to explain what to do for each shell.
  • Module files can also be used by perl, cmake or python.
  • Users can augment the module files we offer by writing their own (module files are written in TCL, see man modulefile).
  • To learn about the command module, read the module man page:
     % man module

(star) The key ways to use module are:

    module help               - show help on the command module itself
    module load   XXX         - load the module XXX
    module unload XXX         - unload the module XXX
    module switch XX/YYY      - switch to module XX/YYY
    module list               - list which modules are currently loaded
    module avail              - show which modules are available
    module -t avail           - show which modules are available, one per line
    module whatis             - show one line information about all the available modules
    module whatis ZZZ         - show one line information about the module ZZZ
    module help   ZZZ         - show the help information on the module ZZZ
    module show   ZZZ         - show what loading the module ZZZ does to your Unix environment.
  • You can easily find out what software is available by "grep'ing" the output of module, i.e.: 
      % ( module -t avail ) | & grep bioinformatics
  • You can access the documentation built into a module file with:
      % module help gcc/4.9.2
  • You can view how a module file will change your Un*x environment with:
      % module show gcc/4.9.2

(warning) The command module implements the concept of conflict: i.e., you cannot load simultaneously gcc/4.9.1 and gcc/4.9.2:

  • to change versions use switch, i.e.:
    % module load gcc/4.9.1
    [do your stuff]
    % module switch gcc/4.9.2

  • to use a different (b/c of conflict) tool, use unload first
    % module load gcc/4.9.1
    [do your stuff that needs gcc ver 4.9.1] 
    % module unload gcc/4.9.1
    % module load pgi/15.1
    [do the stuff that needs PGI ver 15.1]

Module also implements the concept of default value:

  • so module load pgi is equivalent to module load pgi/15.1 if pgi version 15.1 is set as the default pgi module.

(lightbulb) Loading/unloading module(s) may set your MANPATH variable in such a way as to prevent you from accessing default man pages locations.

  Loading the module tools/manpath will solve this.

2.a Private Module Files

Users can write their own private module files to configure/modify their Unix environment by loading or unloading a private module.

This page describe how to do it with examples.

FYI:

  • The purpose of a module file is to simplify how to configure/modify your Unix environment to run or have access to some specific set of tools/applications/etc....
  • A module file also allows users to change their configuration without worrying what shell is being used (bash or csh), 
  • Module files to access general purpose or supported tools/applications are written and maintained by the HPC support team.

2.b Available Module Files

For a current list of all available module files on Hydra, see here (or as plain text here).


(lightbulb) In the bioinformatics and tools categories, the list of module files does not include every version of each software package installed. For these sections, you will need to run  module avail on Hydra to see every available version.

3. Compilers

We support the following three different compilers:

  1. The GNU compilers (gcc, g++, gfortran)
  2. The Intel compilers (icc, icpc, ifort)
  3. The PGI compilers (pgcc, pgf77, pgf90, pgfortran)

MPI is available for each compiler.

To access a compiler, use the corresponding module: [this needs to be updated


GNUIntelPGI
compilermodule load gccmodule load intelmodule load pgi
compiler+MPImodule load gcc/mpich

module load intel/mpi

module load pgi/mvapich

versions

available

4.4.7 (default)

4.9.1, 4.9.2

5.3.0

6.1.0

7.3.0

2015.1

2015.5 (default)

2016.0, 2016.3

2017.1, 2017.10

2018.1

14.10

15.1 (default), 15.9

16.4, 16.10

17.1, 17.10

18.1

Note

  • The default values and the list of available values might change before this documentation page is updated.
  • To check what versions are available, use something like
    % ( module -t avail ) | & grep gcc
  • In most cases, you cannot mix and match compilers, their respective libraries, and the associated run-time environment,
    doing so may lead to unpredictable results.


4. How to Build and Run MPI Programs on Hydra

 [this needs to be updated


GNUIntelPGI

module load

gcc/openmpi

intel/mpipgi/mvapich
compilers

mpicc

mpic++

mpif77

mpif90

mpiicc

mpiicpc

mpiifort

mpicc

mpic++

mpif77

mpif90

submitqsub -pe orte Nqsub -pe orte Nqsub -pe mpich N

machines list file

$PE_HOSTFILE

$PE_HOSTFILE$TMPDIR/machines

run

(in qsub job file)

mpirun -np $NSLOTS ./code

mpirun -np $NSLOTS ./code

mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./code

display content of

machines list file

   echo using $NSLOTS slots on:

cat $PE_HOSTFILE

 echo using $NSLOTS slots on:

cat $PE_HOSTFILE

echo using $NSLOTS slots on:

sort $TMPDIR/machines | uniq -c

IB supportyes, (via OMPI_MCA_btl)yes, (via I_MPI_FABRICS)yes, (via module selection)

examples

and

notes

/home/hpc/examples/mpi/gcc

including mpich, mvapich & openmpi examples

/home/hpc/examples/mpi/intel

Do not use mpicc or mpif90 with the Intel compiler.

/home/hpc/examples/pgi

including mpich, mvapich & openmpi examples

Note

  • The value of N is the number of CPUs (slots, computing elements, etc) you want your code to use,
    • it can also be specified as "N-M", meaning at least N and at most M CPUs (slots, ...)
    • the job script should use the environment variable NSLOTS (via $NSLOTS) to access the assigned number of CPUs (slots),
      that number should not be hardwired;
    • the list of nodes set aside for your MPI job is compiled by the GE and passed to the job script via a machine list file
      that file is either
      • $PE_HOSTFILE (ORTE), or
      • $TMPDIR/machines (MPICH)

  • The string ./code is to be replaced by the name of your MPI executable.
  • Whether you build or run an MPI program, you must first load the corresponding module.
  • There are several implementations of MPI: mpich and openmpi (aka ORTE), plus mvapich the IB implementation of mpich.
  • This can be confusing, so look at the examples.

(lightbulb) You can find more information under Submitting Distributed Parallel Jobs with Explicit Message Passing.

5. How to Build and Run Multi-Threaded Programs

  • You can build and run multi-threaded programs on the cluster;
  • You can either write, or use, a program that starts separate threads to parallelize tasks, or
  • you can use the compilers to produce multi-threaded code, using OpenMP directives, known as pragmas.
    A pragma is a directive that looks like a comment, but get parsed by the compiler when invoking it with the appropriate flag.

  • A multi-threaded code (application) must run on a single compute node and usually uses a shared memory model;
    • The total cumulative available memory of a multi-threaded code is thus limited to the largest amount of memory available on any compute node;
      by contrast MPI code uses a distributed memory model, and can thus access a much larger cumulative amount of memory.
    • Similarly, the total number of threads (CPUs) available to a multi-threaded code is thus limited to the largest amount of CPUs available on any compute node;
      by contrast MPI code uses a distributed model, and can thus make use of a much larger total number of CPUs.

  • To submit a job that will use a multi-threaded application, you must
    • request a number of CPUs via the qsub option -pe mthread N, where N is the number of CPUs
    • The number of requested CPUs can also be specified as N-M, meaning at least N and at most M CPUs
    • The more CPUs you request, the less likely it is that many machines will have that many CPUs and/or that many free CPUs,
    • The maximum number of available CPUs (in the regular queues) is 64, and drops to 40 or 24 for some special nodes.

  • The job script should use the environment variable NSLOTS (via $NSLOTS) to access the allocated number of CPUs (slots), that number should not be hardwired.

OpenMP

  • The following compiler flags enable OpenMP pragmas in your code:

    CompilerFlag
    GNU-fopenmp
    Intel-qopenmp
    -openmp is deprecated
    PGI-mp

    How to parallelize a code using OpenMP directives is beyond the scope of this set of documentation.

  • OpenMP code uses the environment variable OMP_NUM_THREADS to specify the number of threads, it should thus be set to NSLOTS:

    C-shell (csh) syntaxBourne shell (sh) syntax
    setenv OMP_NUM_THREADS $NSLOTSexport OMP_NUM_THREADS=$NSLOTS

You can find more information under Submitting Parallel Jobs.

6. Libraries

The following libraries are available:

LibraryDescriptionWhere to find examples

BLAS & LAPACK

Linear Algebra libraries~hpc/examples/lapack
MKLIntel's Math Kernel Library~hpc/examples/lapack/intel
GSLGNU Scientific Library~hpc/examples/gsl

7. Packages/Tools

6. BioInformatics/Genomics Software


Last updated    SGK

  • No labels