Parallel Jobs

1. Introduction

A parallel job is a job that uses more than one CPU.
Because the cluster is a shared resource, it is the GE (the Grid Engine, or the job scheduler) that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots (i.e. CPUs) when it is submitted.
A parallel job request a parallel environment and a number of slots (CPUs) using the -pe <name> N specification to qsub:
- the <name> is either mpich, orte or mthread (see below);
- the value of N is the number of requested slots (CPUs) and can be specified a N-M, where N is the minimum and M the maximum number of slots requested;
- that option can be an embedded directive.
- The job script access the value of assigned slots via an environment variable (NSLOTS) and, for MPI jobs, gets the list of computed nodes via a so-called machines file.
There are two types of parallel jobs:

MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.
PROs: there is conceptually no limit on how many CPUs can be used, the cumulative amount of CPUs and memory a job can use can get quite large. The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
CONs: each CPU should assume to be on a separate compute node and thus must communicate with the other CPUs to exchange information (aka message passing), Programming can get more complicated and the inter-process communication can become a bottleneck.
Multi-threaded jobs: all the CPUs must be on the same compute node.
PROs: all CPUs can share a common memory space, inter-process communication can be very efficient (being local) and programming is often simpler;
CONs: you can only use as many CPUs as there is on the largest compute node, and you can get them only if they are not in use by someone else.

How do I know which type of parallel job to submit to?
The author of the software will in most cases specify if the application can be parallelized and how
- Some analysis are parallelized by submitting a slew of independent serial jobs, where using a job array may be the best approach;
- some analysis use explicit message passing (MPI); while
- some analysis use a programming model that can use multiple threads.
We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.

A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
There are two distinct implementations of the MPI protocol:
- MPICH and
- ORTE;
OpenMPI is an ORTE implementation of MPI;
MVAPICH is a MPICH implementation, using explicitly the InfiniBand as transport fabric (faster message passing).
OpenMPI is not OpenMP
- OpenMPI is the ORTE implementation of MPI
- OpenMP is a

The follow grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:

In fact, there are more version specific module available, check with

% ( module -t avail ) | & grep pi

for a complete list, then use

% module whatis <module-name>

or

% module help <module-name>

where <module-name> is one of the listed module to get more specific information.

The following example shows how to write an ORTE/OpenMPI job script

The following example shows how to write an ORTE/OpenMPI job script

note: OpenMP is not OpenMPI

(more to come)