Parallel Jobs

1. Introduction

A parallel job is a job that uses more than one CPU.
Because the cluster is a shared resource, it is the GE (the Grid Engine, or the job scheduler) that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots (i.e. CPUs) when it is submitted.
A parallel job request a parallel environment and a number of slots (CPUs) using the -pe <name> N specification to qsub:
- the <name> is either mpich, orte or mthread (see below);
- the value of N is the number of requested slots (CPUs) and can be specified a N-M, where N is the minimum and M the maximum number of slots requested;
- that option can be an embedded directive.
- The job script access the value of assigned slots via an environment variable (NSLOTS) and, for MPI jobs, gets the list of computed nodes via a so-called machines file.
There are two types of parallel jobs:

MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.
PROs: there is conceptually no limit on how many CPUs can be used, the cumulative amount of CPUs and memory a job can use can get quite large. The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
CONs: each CPU should assume to be on a separate compute node and thus must communicate with the other CPUs to exchange information (aka message passing), Programming can get more complicated and the inter-process communication can become a bottleneck.
Multi-threaded jobs: all the CPUs must be on the same compute node.
PROs: all CPUs can share a common memory space, inter-process communication can be very efficient (being local) and programming is often simpler;
CONs: you can only use as many CPUs as there is on the largest compute node, and you can get them only if they are not in use by someone else.

How do I know which type of parallel job to submit to?
The author of the software will in most cases specify if the application can be parallelized and how
- Some analysis are parallelized by submitting a slew of independent serial jobs, where using a job array may be the best approach;
- some analysis use explicit message passing (MPI); while
- some analysis use a programming model that can use multiple threads.

We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.

2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
There are two distinct implementations of the MPI protocol:
- MPICH and
- ORTE;
OpenMPI is an ORTE implementation of MPI;
MVAPICH is a MPICH implementation, using explicitly the InfiniBand as transport fabric (faster message passing).
OpenMPI is not OpenMP
- OpenMPI is the ORTE implementation of MPI;
- OpenMP is an API for multi-platform shared-memory parallel programming.

The follow grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:

	ORTE	MPICH	MVAPICH
GNU	`gcc/openmpi`	`gcc/mpich`	`gcc/mvapich`
GNU gcc 4.4.7	`gcc/4.4/openmpi`	`gcc/4.4/mpich`	`gcc/4.4/mvapich`
GNU gcc 4.9.1	`gcc/4.9/openmpi`	`gcc/4.9/mpich`	`gcc/4.9/mvapich`
GNU gcc 4.9.2	`gcc/4.9/openmpi-4.9.2`	n/a	n/a

Intel	`intel/mpi`	n/a	n/a
Intel v15.x	`intel/15/mpi`	n/a	n/a
Intel v16.x	`intel/16/mpi`	n/a	n/a

PGI	`pgi/openmpi`	`pgi/mpich`	`pgi/mvapich`
PGI v14.x	pgi/14/openmpi	`pgi/14/mpich`	`pgi/14/mvapich`
PGI v15.x	`pgi/15/openmpi`	`pgi/15/mpich`	`pgi/15/mvapich`
PGI v15.9	`pgi/15/openmpi-15.9`	`pgi/15/mpich-15.9`	`pgi/15/mvapich-15.9`

In fact, there are more version specific module available, check with

% ( module -t avail ) | & grep pi

for a complete list, then use

% module whatis <module-name>

or

% module help <module-name>

where <module-name> is one of the listed module to get more specific information.

2.a ORTE or OpenMPI

The following example shows how to write an ORTE/OpenMPI job script:

Example of a ORTE/OpenMPI job script, using Bourne shell syntax

# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N hello -o hello.log
#$ -pe orte 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS distributed over:
cat $PE_HOSTFILE
#
# load gcc's compiler and openmpi
module load gcc/4.9/openmpi-4.9.2
#
# run the program hello
mpirun -np $NSLOTS ./hello
#
echo = `date` job $JOB_NAME done

This example will

show the content of the machine (i.e. the distribution of compute nodes)
load the OpenMPI module for gcc 4.9.2, and
run the program hello,
requesting 72 slots (CPUs).

It assumes that the program hello was build using gcc 4.9.2

2.b MPICH or MVAPICH

The following example shows how to write a MPICH/MVAPICH job script:

Example of a MVAPICH job script, using C-shell syntax

# /bin/csh
#
#$ -cwd -j y - N hello -o hello.log
#$ -pe mpich 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo using $NSLOTS slots on:
sort $TMPDIR/machines | uniq -c
#
# load PGI's mvapich
module load pgi/mvapich
#
# run the program hello
mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./hello
#
echo = `date` job $JOB_NAME done

This example will

show the content of the machine (i.e. the distribution of compute nodes) using sort & uniq to produce a compact format "hostname nCpu"
load the MVAPICH module for the PGI compiler, and
run the program hello,
requesting 72 slots (CPUs).

It assumes that the program hello was build using the PGI compiler and the MVAPICH library/module (to enable IB the transport fabric).

3. Multi-threaded, or OpenMP, Parallel Jobs

A multi-threaded job is a job that will make use of more than one CPU but needs all the CPUs to be on the same compute node.

3.a Multi-threaded job

The following example shows how to write a multi-threaded script:

Example of a Multi-threaded job script, using teh Bourne shell syntax

# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N demo -o demo.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the demo (fictional) module
module load tools/demo
#
# convert the generic parameter file, gen-params, to a specific file
# where the number of thread are inserted where the string MTHREADS is found
sed "s/MTHREADS/$NSLOTS/" gen-params > all-params
#
# run the demo program specifying all-params as the parameter file
demo -p all-params
#
echo = `date` job $JOB_NAME done

This example will run a fictional demo using 32 CPUs (slots). The script

loads the fictional tools/demo module,
parses the file gen-params and replace every occurrence of the string MTHREAD by the allocated number of slots (via $NSLOTS), using the stream editor (man sed),
saves the result to a file called all-params,
runs the program demo and specify to use the parameter file all-params.

3.b OpenMP job

The following example shows how to write an OpenMP job script:

Example of a OpenMP job script, using C-shell syntax

# /bin/csh
#
#$ -cwd -j y -N hellomp -o hellomp.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the pgi module
module load pgi
#
# set the variable OMP_NUM_THREADS to the content of NSLOTS
# this tell OpenMP applications how many threads/slots/CPUs to use
setenv OMP_NUM_THREADS $NSLOTS 
#
# run the hellomp OpenMP program, build w/ PGI
./hellomp
#
echo = `date` job $JOB_NAME done

This example will run the program hellomp, that was compiled with the PGI compiler, using 32 threads (CPUs/slots). The script

loads the pgi module,
sets OMP_NUM_THREADS to the content of NSLOTS to specify the number of threads
runs the program hellomp.

3. Examples

You can find examples of simple/trivial test cases with source code, Makefile, job file and log file under /home/hpc/examples

Page tree