You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

  1. Introduction
  2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

    1. ORTE or OpenMPI

    2. MPICH or MVAPICH

  3. Multi-threaded, or OpenMP, Parallel Jobs

    1. Multi-threaded job

    2. OpenMP job

  4. Examples

1. Introduction

  • A parallel job is a job that uses more than one CPU.
  • Since the cluster is a shared resource, it is the GE that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots when it is submitted.

  • A parallel job must request a parallel environment and a number of slots using the -pe <name> N specification to qsub, where

    • <name> is either mpich, orte or mthread (see below);

    • the value of N is the number of requested slots and can be specified a N-M, where N is the minimum and M the maximum number of slots requested;
    • that option can be an embedded directive.

    • The job script access the nnmber of allocated slots via an environment variable (NSLOTS) and,

    • for MPI jobs, gets the list of computed nodes via a so-called machines file.

  • There are two types of parallel jobs:
    1. MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.

      PROsThere is conceptually no limit on how many CPUs can be used,
      the cumulative amount of CPUs and memory a job can use can get quite large.
      The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
      CONsEach CPU should assume to be on a separate compute node and
      thus must communicate with the other CPUs to exchange information (aka message passing).
      Programming can get more complicated and the inter-process communication can become a bottleneck.
    2. Multi-threaded jobs: all the CPUs must be on the same compute node.

      PROsAll CPUs can share a common memory space,
      inter-process communication can be very efficient (being local) and
      programming can be simpler;
      CONsCan only use as many CPUs as there is on the largest compute node, and
      can get them only if they are not in use by someone else.
  • How do I know which type of parallel job to submit to?
    The author of the software will in most cases specify if the application can be parallelized and how
    • Some analysis are parallelized by submitting a slew of independent serial jobs,
      • in which case using a job array may be the best approach;
    • some analysis use explicit message passing (MPI); while
    • some analysis use a programming model that can use multiple threads.  

 

  • We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.

2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

  • A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
  • There are two distinct implementations of the MPI protocol:
    • MPICH and
    • ORTE;
  • OpenMPI is an ORTE implementation of MPI;
  • MVAPICH is a MPICH implementation, using explicitly the InfiniBand as transport fabric (faster message passing).
  • OpenMPI is not OpenMP
    • OpenMPI is the ORTE implementation of MPI;
    • OpenMP is an API for multi-platform shared-memory parallel programming.

 

The following grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:

 ORTEMPICHMVAPICH
GNUgcc/openmpigcc/mpichgcc/mvapich
GNU gcc 4.4.7gcc/4.4/openmpigcc/4.4/mpichgcc/4.4/mvapich
GNU gcc 4.9.1gcc/4.9/openmpigcc/4.9/mpichgcc/4.9/mvapich
GNU gcc 4.9.2gcc/4.9/openmpi-4.9.2n/an/a
    
Intelintel/mpin/an/a
Intel v15.xintel/15/mpin/an/a
Intel v16.xintel/16/mpin/an/a
    
PGIpgi/openmpipgi/mpichpgi/mvapich
PGI v14.xpgi/14/openmpipgi/14/mpichpgi/14/mvapich
PGI v15.xpgi/15/openmpipgi/15/mpichpgi/15/mvapich
PGI v15.9pgi/15/openmpi-15.9pgi/15/mpich-15.9pgi/15/mvapich-15.9

In fact, there are more version specific modules available, check with

   % ( module -t avail ) | & grep pi

for a complete list, then use

   % module whatis <module-name>

or

   % module help <module-name>

where <module-name> is one of the listed module to get more specific information.

2.a ORTE or OpenMPI

The following example shows how to write an ORTE/OpenMPI job script:

Example of a ORTE/OpenMPI job script, using Bourne shell syntax
# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N hello -o hello.log
#$ -pe orte 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS distributed over:
cat $PE_HOSTFILE
#
# load gcc's compiler and openmpi
module load gcc/4.9/openmpi-4.9.2
#
# run the program hello
mpirun -np $NSLOTS ./hello
#
echo = `date` job $JOB_NAME done

This example will

  • show the content of the machine file (i.e. the distribution of compute nodes)
  • load the OpenMPI module for gcc version 4.9.2, and
  • run the program hello,
  • requesting 72 slots (CPUs).

It assumes that the program hello was build using gcc v4.9.2

2.b MPICH or MVAPICH

The following example shows how to write a MVAPICH job script:

Example of a MVAPICH job script, using C-shell syntax
# /bin/csh
#
#$ -cwd -j y - N hello -o hello.log
#$ -pe mpich 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo using $NSLOTS slots on:
sort $TMPDIR/machines | uniq -c
#
# load PGI's mvapich
module load pgi/mvapich
#
# run the program hello
mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./hello
#
echo = `date` job $JOB_NAME done

This example will

  • show the content of the machine file (i.e. the distribution of compute nodes),
    •  using sort & uniq to produce a compact list, in a "hostname no_of_CPUs" format.
  • load the MVAPICH module for the PGI compiler, and
  • run the program hello,
  • requesting 72 slots (CPUs).

It assumes that the program hello was build using the PGI compiler and the MVAPICH library/module to enable the IB as the transport fabric.

You could replace MVAPICH by MVPICH if you do not want to use the IB.

3. Multi-threaded, or OpenMP, Parallel Jobs

A multi-threaded job is a job that will make use of more than one CPU but needs all the CPUs to be on the same compute node.

3.a Multi-threaded job

The following example shows how to write a multi-threaded script:

Example of a Multi-threaded job script, using Bourne shell syntax
# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N demo -o demo.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the demo (fictional) module
module load tools/demo
#
# convert the generic parameter file, gen-params, to a specific file
# where the number of thread are inserted where the string MTHREADS is found
sed "s/MTHREADS/$NSLOTS/" gen-params > all-params
#
# run the demo program specifying all-params as the parameter file
demo -p all-params
#
echo = `date` job $JOB_NAME done

This example will the tool "demo" using 32 CPUs (slots). The script

  • loads the tools/demo module,
  • parses the file gen-params and replace every occurrence of the string MTHREAD by the allocated number of slots (via $NSLOTS),
    • using the stream editor sed (man sed),
  • saves the result to a file called all-params,
  • runs the tool demo and with the parameter file all-params.

3.b OpenMP job

The following example shows how to write an OpenMP job script:

Example of a OpenMP job script, using C-shell syntax
# /bin/csh
#
#$ -cwd -j y -N hellomp -o hellomp.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the pgi module
module load pgi
#
# set the variable OMP_NUM_THREADS to the content of NSLOTS
# this tell OpenMP applications how many threads/slots/CPUs to use
setenv OMP_NUM_THREADS $NSLOTS 
#
# run the hellomp OpenMP program, build w/ PGI
./hellomp
#
echo = `date` job $JOB_NAME done

This example will run the program hellomp, that was compiled with the PGI compiler, using 32 threads (CPUs/slots). The script

  • loads the pgi  module,
  • sets OMP_NUM_THREADS to the content of NSLOTS to specify the number of threads
  • runs the program hellomp.

4. Examples

You can find examples of simple/trivial test cases with source code, Makefile, job file and log file under /home/hpc/examples

(list of examples still missing)


Last Updated SGK.

  • No labels