1. Parallel Jobs#Introduction
  2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

    1. ORTE or OpenMPI

    2. MPICH or MVAPICH

  3. Multi-threaded, or OpenMP, Parallel Jobs

    1. Multi-threaded job

    2. OpenMP job

  4. Hybrid Jobs

1. Introduction

    1. MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.

      PROsThere is conceptually no limit on how many CPUs can be used,
      the cumulative amount of CPUs and memory a job can use can get quite large.
      The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
      CONsEach CPU is assumed to be on a separate compute node and thus each process
      must communicate with the other CPUs to exchange information (aka message passing).
      Programming can get more complicated and the inter-process communication can become a bottleneck.

        

    2. Multi-threaded jobs: all the CPUs must be on the same compute node.

      PROsAll CPUs can share a common memory space,
      inter-process communication can be very efficient (being local) and
      programming can be simpler;
      CONsCan only use as many CPUs as there are on the largest compute node, and
      can get them only if they are not in use by someone else.

        

    3. Hybrid jobs: the CPUs are distributed, but with the same number of CPUs on each compute node.

      PROsThe CPUs on the same node can share a common memory space,
      while not all CPUs ate on the same compute node,
      hence the total number of CPUs is not limited to the number of CPUs on the largest compute node;
      CONsCoding must mix inter-process communication (like MPI) with shared memory and multi-threading (like OpenMP).
      This can be tricky, but some problems can greatly benefit from this model.


2. MPI, or Distributed Parallel Jobs with Explicit Message Passing


The following grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:


ORTEMPICHMVAPICH
GNUgcc/openmpigcc/mpichgcc/mvapich
GNU gcc 4.4.7gcc/4.4/openmpigcc/4.4/mpichgcc/4.4/mvapich
GNU gcc 4.9.1gcc/4.9/openmpigcc/4.9/mpichgcc/4.9/mvapich
GNU gcc 4.9.2gcc/4.9/openmpi-4.9.2n/an/a




Intelintel/mpin/an/a
Intel v15.xintel/15/mpin/an/a
Intel v16.xintel/16/mpin/an/a




PGIpgi/openmpipgi/mpichpgi/mvapich
PGI v14.xpgi/14/openmpipgi/14/mpichpgi/14/mvapich
PGI v15.xpgi/15/openmpipgi/15/mpichpgi/15/mvapich
PGI v15.9pgi/15/openmpi-15.9pgi/15/mpich-15.9pgi/15/mvapich-15.9

In fact, there are more version specific modules available, check with

   % ( module -t avail ) | & grep pi

for a complete list,. You can also use

   % module whatis <module-name>

or

   % module help <module-name>

where <module-name> is one of the listed module, to get more specific information.

2.a ORTE or OpenMPI

The following example shows how to write an ORTE/OpenMPI job script:

# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N hello -o hello.log
#$ -pe orte 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS distributed over:
cat $PE_HOSTFILE
#
# load gcc's compiler and openmpi
module load gcc/4.9/openmpi-4.9.2
#
# run the program hello
mpirun -np $NSLOTS ./hello
#
echo = `date` job $JOB_NAME done

This example will

It assumes that the program hello was built using gcc v4.9.2.

2.b MPICH or MVAPICH

The following example shows how to write a MVAPICH job script:

# /bin/csh
#
#$ -cwd -j y - N hello -o hello.log
#$ -pe mpich 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo using $NSLOTS slots on:
sort $TMPDIR/machines | uniq -c
#
# load PGI's mvapich
module load pgi/mvapich
#
# run the program hello
mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./hello
#
echo = `date` job $JOB_NAME done

This example will

It assumes that the program hello was build using the PGI compiler and the MVAPICH library/module to enable the IB as the transport fabric.

You could replace MVAPICH by MVPICH if you do not want to use the IB.

2.c Notes

3. Multi-threaded, or OpenMP, Parallel Jobs

A multi-threaded job is a job that will make use of more than one CPU but needs all the CPUs to be on the same compute node.

3.a Multi-threaded jobs

The following example shows how to write a multi-threaded script:

# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N demo -o demo.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the demo (fictional) module
module load tools/demo
#
# convert the generic parameter file, gen-params, to a specific file
# where the number of thread are inserted where the string MTHREADS is found
sed "s/MTHREADS/$NSLOTS/" gen-params > all-params
#
# run the demo program specifying all-params as the parameter file
demo -p all-params
#
echo = `date` job $JOB_NAME done

This example will run the tool demo using 32 CPUs (slots). The script

3.b OpenMP jobs

The following example shows how to write an OpenMP job script:

# /bin/csh
#
#$ -cwd -j y -N hellomp -o hellomp.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the pgi module
module load pgi
#
# set the variable OMP_NUM_THREADS to the content of NSLOTS
# this tell OpenMP applications how many threads/slots/CPUs to use
setenv OMP_NUM_THREADS $NSLOTS 
#
# run the hellomp OpenMP program, build w/ PGI
./hellomp
#
echo = `date` job $JOB_NAME done

This example will run the program hellomp, that was compiled with the PGI compiler, using 32 threads (CPUs/slots). The script

4. Hybrid Jobs

How to Run a Hybrid Job

To run a job as a hybrid job you need to

  1. request one of the hybrid PEs (parallel environments)
  2. source a configuration file that is created at run-time ($TMPDIR/set-hybrid-config)
  3. run a code written for a hybrid PE.

Examples

Examples of hybrid jobs are under /home/hpc/examples/hybrid.

(warning) The key line(s) in the job file examples are (warning):

csh syntaxsh syntax

if (-e $TMPDIR/set-hybrid-config) then

  source $TMPDIR/set-hybrid-config

endif

if [ -e $TMPDIR/set-hybrid-config ]

then

  source $TMPDIR/set-hybrid-config

fi

--------------- hybrid_start 1.0/1 ---------------
hybrid_start: remember to 'source $TMPDIR/set-hybrid-config' to properly setup your env
--------------------------------------------------
....
PE=h8 NSLOTS=4 OMP_NUM_THREADS=8

Available PEs

You can use one of the following hybrid PEs:

Name

No of CPUs/node

Example Means
h2-pe  h2 64request 64 slots distributed as 2 CPUs/node on 32 different nodes
h4-pe  h4 64request 64 slots as 4 CPUs/node on 16 different nodes
h8-pe  h8 64request 64 slots  as 8 CPUs/node on 8 different nodes
h1212-pe h12 48request 48 slots as 12 CPUs/node on 4 different nodes
h1616-pe h16 64request 64 slots as 16 CPUs/node on 4 different nodes
etc... up to h32 by increment of 4: h2, h4, h8, h12, h16, h20, h24, h28 and h32

 Note

(warning) You specify M and N, where N = K x M:

More Details/Explanations

Simple job, using C-shell syntax, to run an OpenMPI/OpenMP hybrid code, called hybrid.

# /bin/csh
#
#$ -q mThC.q -pe h8 64
#$ -N hybrid -o hybrid.log -cwd -j y
#
echo $JOB_NAME started `date` on $HOSTNAME in $QUEUE jobID=$JOB_ID
#
if (-e $TMPDIR/set-hybrid-config) then
  source $TMPDIR/set-hybrid-config
endif
#
module load gcc/4.4/openmpi
mpirun -np $NSLOTS -hostfile $HOSTFILE ./hybrid
#
echo `date` $JOB_NAME done.


This job will produce as output:

--------------- hybrid_start 1.0/1 ---------------
hybrid_start: remember to 'source $TMPDIR/set-hybrid-config' to properly setup your env
--------------------------------------------------
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hybrid started Thu Aug 18 13:32:05 EDT 2016 on compute-1-4.local in mThC.q jobID=6374206
PE=h8 NSLOTS=8 OMP_NUM_THREADS=8
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  1, iSize=  8, hostname=compute-1-4.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  5, iSize=  8, hostname=compute-3-11.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  2, iSize=  8, hostname=compute-1-5.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  3, iSize=  8, hostname=compute-1-6.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  4, iSize=  8, hostname=compute-2-15.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  6, iSize=  8, hostname=compute-3-3.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  7, iSize=  8, hostname=compute-3-4.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
hello from iRank=  0, iSize=  8, hostname=compute-1-3.local
Thu Aug 18 13:32:10 EDT 2016 hybrid done.


(tick) The program hybrid corresponds to the following trivial F90 code:

program hello
  !
  include 'mpif.h'
  !
  integer iErr, iRank, iSize
  integer mpiComm, msgTag
  !
  character*40 hostname
  call HOSTNM(hostname)
  !
  mpiComm = MPI_COMM_WORLD
  msgTag  = 0
  !
  call MPI_INIT(iErr)
  call MPI_COMM_RANK(mpiComm, iRank, iErr)
  call MPI_COMM_SIZE(mpiComm, iSize, iErr)
  !
  !$omp parallel
  print 9000, 'hello from iRank=',iRank, &
       ', iSize=', iSize, &
       ', hostname=', trim(hostname)
  !$omp end parallel
  !
  call MPI_FINALIZE(iErr)
9000 format(a,i3,a,i3,a,a)
!
end program hello


(lightbulb) The configuration file ($TMPDIR/set-hybrid-config) does the following:

(plus) You can find more examples in ~hpc/examples/hybrid, where this example is build and run for differents compilers (gnu, Intel, PGI) , using either MPI or OpenMPI and using the sh or csh syntax.



Last Updated   SGK.