You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

  1. Introduction
  2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

    1. ORTE or OpenMPI

    2. MPICH or MVAPICH

  3. Multi-threaded, or OpenMP, Parallel Jobs

    1. Multi-threaded job

    2. OpenMP job

  4. Examples

1. Introduction

  • A parallel job is a job that uses more than one CPU.
  • Since the cluster is a shared resource, it is the GE that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots when it is submitted.

  • A parallel job must request a parallel environment and a number of slots using the -pe <name> N specification to qsub, where

    • <name> is either mpich, orte or mthread (see below);

    • the value of N is the number of requested slots and can be specified a N-M, where N is the minimum and M the maximum number of slots requested;
    • that option can be an embedded directive.

    • The job script access the nnmber of allocated slots via an environment variable (NSLOTS) and,

    • for MPI jobs, gets the list of computed nodes via a so-called machines file.

  • There are two types of parallel jobs:
    1. MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.

      PROsThere is conceptually no limit on how many CPUs can be used,
      the cumulative amount of CPUs and memory a job can use can get quite large.
      The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
      CONsEach CPU should assume to be on a separate compute node and
      thus must communicate with the other CPUs to exchange information (aka message passing).
      Programming can get more complicated and the inter-process communication can become a bottleneck.
    2. Multi-threaded jobs: all the CPUs must be on the same compute node.

      PROsAll CPUs can share a common memory space,
      inter-process communication can be very efficient (being local) and
      programming can be simpler;
      CONsCan only use as many CPUs as there is on the largest compute node, and
      can get them only if they are not in use by someone else.
  • How do I know which type of parallel job to submit to?
    The author of the software will in most cases specify if the application can be parallelized and how
    • Some analysis are parallelized by submitting a slew of independent serial jobs,
      • in which case using a job array may be the best approach;
    • some analysis use explicit message passing (MPI); while
    • some analysis use a programming model that can use multiple threads.  

 

  • We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.

2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

  • A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
  • There are two distinct implementations of the MPI protocol:
    • MPICH and
    • ORTE;
  • OpenMPI is an ORTE implementation of MPI;
  • MVAPICH is a MPICH implementation, using explicitly the InfiniBand as transport fabric (faster message passing).
  • OpenMPI is not OpenMP
    • OpenMPI is the ORTE implementation of MPI;
    • OpenMP is an API for multi-platform shared-memory parallel programming.

 

The following grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:

 ORTEMPICHMVAPICH
GNUgcc/openmpigcc/mpichgcc/mvapich
GNU gcc 4.4.7gcc/4.4/openmpigcc/4.4/mpichgcc/4.4/mvapich
GNU gcc 4.9.1gcc/4.9/openmpigcc/4.9/mpichgcc/4.9/mvapich
GNU gcc 4.9.2gcc/4.9/openmpi-4.9.2n/an/a
    
Intelintel/mpin/an/a
Intel v15.xintel/15/mpin/an/a
Intel v16.xintel/16/mpin/an/a
    
PGIpgi/openmpipgi/mpichpgi/mvapich
PGI v14.xpgi/14/openmpipgi/14/mpichpgi/14/mvapich
PGI v15.xpgi/15/openmpipgi/15/mpichpgi/15/mvapich
PGI v15.9pgi/15/openmpi-15.9pgi/15/mpich-15.9pgi/15/mvapich-15.9

In fact, there are more version specific modules available, check with

   % ( module -t avail ) | & grep pi

for a complete list, then use

   % module whatis <module-name>

or

   % module help <module-name>

where <module-name> is one of the listed module to get more specific information.

2.a ORTE or OpenMPI

The following example shows how to write an ORTE/OpenMPI job script:

Example of a ORTE/OpenMPI job script, using Bourne shell syntax
# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N hello -o hello.log
#$ -pe orte 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS distributed over:
cat $PE_HOSTFILE
#
# load gcc's compiler and openmpi
module load gcc/4.9/openmpi-4.9.2
#
# run the program hello
mpirun -np $NSLOTS ./hello
#
echo = `date` job $JOB_NAME done

This example will

  • show the content of the machine file (i.e. the distribution of compute nodes)
  • load the OpenMPI module for gcc version 4.9.2, and
  • run the program hello,
  • requesting 72 slots (CPUs).

It assumes that the program hello was build using gcc v4.9.2

2.b MPICH or MVAPICH

The following example shows how to write a MVAPICH job script:

Example of a MVAPICH job script, using C-shell syntax
# /bin/csh
#
#$ -cwd -j y - N hello -o hello.log
#$ -pe mpich 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo using $NSLOTS slots on:
sort $TMPDIR/machines | uniq -c
#
# load PGI's mvapich
module load pgi/mvapich
#
# run the program hello
mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./hello
#
echo = `date` job $JOB_NAME done

This example will

  • show the content of the machine file (i.e. the distribution of compute nodes),
    •  using sort & uniq to produce a compact list, in a "hostname no_of_CPUs" format.
  • load the MVAPICH module for the PGI compiler, and
  • run the program hello,
  • requesting 72 slots (CPUs).

It assumes that the program hello was build using the PGI compiler and the MVAPICH library/module to enable the IB as the transport fabric.

You could replace MVAPICH by MVPICH if you do not want to use the IB.

2.c Notes

  • The command mpirun is aliased by the module file to use the full path of the correct version for each case.
  • Do not use a full path specification to a version of mpirun, using a wrong version of mpirun will result in unpredictable results.
    You can check which version correspond to a module with either
       % module show <module-name>
    or, if you use the C-shell,
       % alias mpirun 
    of, is your use the Bourne shell
       % declare -f mpirun

  • The following is  a simple trick to get the MPI code full path and use it on the mpirun line:

    Bourne shell syntaxC-Shell syntax
    # load the module
    module load tools/mpidemo
    # get the full path
    bin=`which demo_mpi`
    # show what will be executed
    declare -f mpirun
    echo mpirun -np $NSLOTS $bin
    # run it
    mpirun -np $NSLOTS $bin
    # load the module
    module load tools/mpidemo
    # get the full path
    set bin = `which demo_mpi`
    # show what will be executed
    alias mpirun
    echo mpirun -np $NSLOTS $bin
    # run it
    mpirun -np $NSLOTS $bin

    This example assumes that the module tools/mpidemo set things up to run the MPI code mpidemo, and prevent having to use anywhere a full path.

  • In most cases use mvapich and not mpich.

  • One can query the technical implementation details of MPI for each module,

    since each MPI-enabling module implement a slightly different version of MPI.

    • You query the precise details of each implementation as follow:

      Commandto
      % module show <module-file>Show how the module changes your Un*x environment.
      All the modules set the same env variables: MPILIB MPIINC MPIBIN
      plus either OPENMPIMPICH, or MVAPICH,
      and set the alias mpirun to use the corresponding full path.
      % module help <module-file>Show details on the module, and
      how to retrieve the details of the specific build.
       Depending on the MPI implementation (ORTE or MPICH)
      % module load <module-file>
      % ompi_info [-all]
      Show precise details of an ORTE implementation
      (as shown by module help <module-file>)
      or 

      % module load <module-file>

      % mpirun -info

      Show precise details of an MPICH/MVAPICH implementation

      (as shown by module help <module-file>)



3. Multi-threaded, or OpenMP, Parallel Jobs

A multi-threaded job is a job that will make use of more than one CPU but needs all the CPUs to be on the same compute node.

3.a Multi-threaded job

The following example shows how to write a multi-threaded script:

Example of a Multi-threaded job script, using Bourne shell syntax
# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N demo -o demo.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the demo (fictional) module
module load tools/demo
#
# convert the generic parameter file, gen-params, to a specific file
# where the number of thread are inserted where the string MTHREADS is found
sed "s/MTHREADS/$NSLOTS/" gen-params > all-params
#
# run the demo program specifying all-params as the parameter file
demo -p all-params
#
echo = `date` job $JOB_NAME done

This example will the tool "demo" using 32 CPUs (slots). The script

  • loads the tools/demo module,
  • parses the file gen-params and replace every occurrence of the string MTHREAD by the allocated number of slots (via $NSLOTS),
    • using the stream editor sed (man sed),
  • saves the result to a file called all-params,
  • runs the tool demo and with the parameter file all-params.

3.b OpenMP job

The following example shows how to write an OpenMP job script:

Example of a OpenMP job script, using C-shell syntax
# /bin/csh
#
#$ -cwd -j y -N hellomp -o hellomp.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the pgi module
module load pgi
#
# set the variable OMP_NUM_THREADS to the content of NSLOTS
# this tell OpenMP applications how many threads/slots/CPUs to use
setenv OMP_NUM_THREADS $NSLOTS 
#
# run the hellomp OpenMP program, build w/ PGI
./hellomp
#
echo = `date` job $JOB_NAME done

This example will run the program hellomp, that was compiled with the PGI compiler, using 32 threads (CPUs/slots). The script

  • loads the pgi  module,
  • sets OMP_NUM_THREADS to the content of NSLOTS to specify the number of threads
  • runs the program hellomp.

4. Examples

You can find examples of simple/trivial test cases with source code, Makefile, job file and log file under /home/hpc/examples

(list of examples still missing)


Last Updated SGK.

  • No labels