1. Introduction
- A parallel job is a job that uses more than one CPU.
Because the cluster is a shared resource, it is the GE (the Grid Engine, or the job scheduler) that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots (i.e. CPUs) when it is submitted.
A parallel job request a parallel environment and a number of slots (CPUs) using the
-pe <name> N
specification toqsub
:the
<name>
is eithermpich
,orte
ormthread
(see below);- the value of
N
is the number of requested slots (CPUs) and can be specified aN-M
, whereN
is the minimum andM
the maximum number of slots requested; that option can be an embedded directive.
The job script access the value of assigned slots via an environment variable (
NSLOTS
) and, for MPI jobs, gets the list of computed nodes via a so-called machines file.
- There are two types of parallel jobs:
- MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.
PROs: there is conceptually no limit on how many CPUs can be used, the cumulative amount of CPUs and memory a job can use can get quite large. The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
CONs: each CPU should assume to be on a separate compute node and thus must communicate with the other CPUs to exchange information (aka message passing), Programming can get more complicated and the inter-process communication can become a bottleneck. - Multi-threaded jobs: all the CPUs must be on the same compute node.
PROs: all CPUs can share a common memory space, inter-process communication can be very efficient (being local) and programming is often simpler;
CONs: you can only use as many CPUs as there is on the largest compute node, and you can get them only if they are not in use by someone else.
- How do I know which type of parallel job to submit to?
The author of the software will in most cases specify if the application can be parallelized and how- Some analysis are parallelized by submitting a slew of independent serial jobs, where using a job array may be the best approach;
- some analysis use explicit message passing (MPI); while
- some analysis use a programming model that can use multiple threads.
- We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.
2. MPI, or Distributed Parallel Jobs with Explicit Message Passing
- A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
- There are two distinct implementations of the MPI protocol:
MPICH
andORTE
;
OpenMPI
is anORTE
implementation of MPI;MVAPICH
is aMPICH
implementation, using explicitly the InfiniBand as transport fabric (faster message passing).OpenMPI
is notOpenMP
OpenMPI
is theORTE
implementation of MPI;OpenMP
is an API for multi-platform shared-memory parallel programming.
The follow grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:
ORTE | MPICH | MVAPICH | |
---|---|---|---|
GNU | gcc/openmpi | gcc/mpich | gcc/mvapich |
GNU gcc 4.4.7 | gcc/4.4/openmpi | gcc/4.4/mpich | gcc/4.4/mvapich |
GNU gcc 4.9.1 | gcc/4.9/openmpi | gcc/4.9/mpich | gcc/4.9/mvapich |
GNU gcc 4.9.2 | gcc/4.9/openmpi-4.9.2 | n/a | n/a |
Intel | intel/mpi | n/a | n/a |
Intel v15.x | intel/15/mpi | n/a | n/a |
Intel v16.x | intel/16/mpi | n/a | n/a |
PGI | pgi/openmpi | pgi/mpich | pgi/mvapich |
PGI v14.x | pgi/14/openmpi | pgi/14/mpich | pgi/14/mvapich |
PGI v15.x | pgi/15/openmpi | pgi/15/mpich | pgi/15/mvapich |
PGI v15.9 | pgi/15/openmpi-15.9 | pgi/15/mpich-15.9 | pgi/15/mvapich-15.9 |
In fact, there are more version specific module available, check with
% ( module -t avail ) | & grep pi
for a complete list, then use
% module whatis <module-name>
or
% module help <module-name>
where <module-name>
is one of the listed module to get more specific information.
2.a ORTE or OpenMPI
The following example shows how to write an ORTE/OpenMPI job script:
# /bin/sh # #$ -S /bin/sh #$ -cwd -j y -N hello -o hello.log #$ -pe orte 72 # echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME echo + NSLOTS = $NSLOTS distributed over: cat $PE_HOSTFILE # # load gcc's compiler and openmpi module load gcc/4.9/openmpi-4.9.2 # # run the program hello mpirun -np $NSLOTS ./hello # echo = `date` job $JOB_NAME done
This example will load the OpenMPI module for gcc 4.9.2, and run the program hello
, requesting 72 slots (CPUs).
It assumes that the program hello was build using gcc 4.9.2
2.b MPICH or MVAPICH
The following example shows how to write a MPICH/MVAPICH job script:
# /bin/csh # #$ -cwd -j y - N hello -o hello.log #$ -pe mpich 72 # echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME echo using $NSLOTS slots on: sort $TMPDIR/machines | uniq -c # # load PGI's mvapich module load pgi/mvapich # # run the program hello mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./hello # echo = `date` job $JOB_NAME done
This example will load the MVAPICH module for the PGI compiler, and run the program hello
, requesting 72 slots (CPUs).
It assumes that the program hello was build using the PGI compiler and the MVAPICH library/module (to enable IB the transport fabric).
3. Multi-threaded, or OpenMP, Parallel Jobs
A multi-threaded job is a job that will make use of more than one CPU but needs all the CPUs to be on the same compute node.
3.a Multi-threaded job
The following example shows how to write a multi-threaded script:
# /bin/sh # #$ -S /bin/sh #$ -cwd -j y -N demo -o demo.log #$ -pe mthread 32 # echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME echo + NSLOTS = $NSLOTS # # load the demo (fictional) module module load tools/demo # # convert the generic parameter file, gen-params, to a specific file # where the number of thread are inserted where the string MTHREADS is found sed "s/MTHREADS/$NSLOTS/" gen-params > all-params # # run the demo program specifying all-params as the parameter file demo -p all-params # echo = `date` job $JOB_NAME done
This example will run a fictional demo using 32 CPUs (slots). The script
- loads the fictional tools/demo module,
- parses the file gen-params and replace every occurrence of the string MTHREAD by the allocated number of slots (via $NSLOTS), using the stream editor (sed)
- saves the result to a file called all-params
- runs the program demo and specify to use the parameter file all-params.
3.b OpenMP job
The following example shows how to write an OpenMP job script:
# /bin/csh # #$ -cwd -j y -N hellomp -o hellomp.log #$ -pe mthread 32 # echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME echo + NSLOTS = $NSLOTS # # load the pgi module module load pgi # # set the variable OMP_NUM_THREADS to the content of NSLOTS # this tell OpenMP applications how many threads/slots/CPUs to use setenv OMP_NUM_THREADS $NSLOTS # # run the hellomp OpenMP program, build w/ PGI ./hellomp # echo = `date` job $JOB_NAME done
This example will run the program hellomp
, that was compiled with the PGI compiler, using 32 threads (CPUs/slots). The script
- loads the pgi module,
- sets
OMP_NUM_THREADS
to the content ofNSLOTS
to specify the number of threads - runs the program
hellomp
.
Examples
You can find examples of simple/trivial test cases with source code, Makefile, job file and log file under /home/hpc/examples