You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

1. Introduction

 

  • A parallel job is a job that uses more than one CPU.
  • Because the cluster is a shared resource, it is the GE (the Grid Engine, or the job scheduler) that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots (i.e. CPUs) when it is submitted.

  • A parallel job request a parallel environment and a number of slots (CPUs) using the -pe <name> N specification to qsub:

    • the <name> is either mpich, orte or mthread (see below);

    • the value of N is the number of requested slots (CPUs) and can be specified a N-M, where N is the minimum and M the maximum number of slots requested;
    • that option can be an embedded directive.

    • The job script access the value of assigned slots via an environment variable (NSLOTS) and, for MPI jobs, gets the list of computed nodes via a so-called machines file.

  • There are two types of parallel jobs:
  1. MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.
    PROs: there is conceptually no limit on how many CPUs can be used, the cumulative amount of CPUs and memory a job can use can get quite large. The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
    CONs: each CPU should assume to be on a separate compute node and thus must communicate with the other CPUs to exchange information (aka message passing), Programming can get more complicated and the inter-process communication can become a bottleneck.

  2. Multi-threaded jobs: all the CPUs must be on the same compute node.
    PROs: all CPUs can share a common memory space, inter-process communication can be very efficient (being local) and programming is often simpler;
    CONs: you can only use as many CPUs as there is on the largest compute node, and you can get them only if they are not in use by someone else.

  • How do I know which type of parallel job to submit to?
    The author of the software will in most cases specify if the application can be parallelized and how
    • Some analysis are parallelized by submitting a slew of independent serial jobs, where using a job array may be the best approach;
    • some analysis use explicit message passing (MPI); while
    • some analysis use a programming model that can use multiple threads.  
  • We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.

2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

  • A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
  • There are two distinct implementations of the MPI protocol:
    • MPICH and
    • ORTE;
  • OpenMPI is an ORTE implementation of MPI;
  • MVAPICH is a MPICH implementation, using explicitly the InfiniBand as transport fabric (faster message passing).
  • OpenMPI is not OpenMP
    • OpenMPI is the ORTE implementation of MPI
    • OpenMP is a

 

The follow grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:

 ORTEMPICHMVAPICH
GNUgcc/openmpigcc/mpichgcc/mvapich
GNU gcc 4.4.7gcc/4.4/openmpigcc/4.4/mpichgcc/4.4/mvapich
GNU gcc 4.9.1gcc/4.9/openmpigcc/4.9/mpichgcc/4.9/mvapich
GNU gcc 4.9.2gcc/4.9/openmpi-4.9.2n/an/a
    
Intelintel/mpin/an/a
Intel  v15.xintel/15/mpin/an/a
Intel v16.xintel/16/mpin/an/a
    
PGIpgi/openmpipgi/mpichpgi/mvapich
PGI v14.xpgi/14/openmpipgi/14/mpichpgi/14/mvapich
PGI v15.xpgi/15/openmpipgi/15/mpichpgi/15/mvapich
PGI v15.xpgi/15/openmpi-15.9pgi/15/mpich-15.9pgi/15/mvapich-15.9

In fact, there are more version specific module available, check with

   % ( module -t avail ) | & grep pi

for a complete list, then use

   % module whatis <module-name>

or

   % module help <module-name>

where <module-name> is one of the listed module to get more specific information.

2.a ORTE or OpenMPI

The following example shows how to write an ORTE/OpenMPI job script

2.b MPICH or MVAPICH

The following example shows how to write an ORTE/OpenMPI job script

3. Multi-threaded, or OpenMP, Parallel Jobs

note: OpenMP is not OpenMPI

(more to come)

 

  • No labels