You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

1. Introduction

 

  • A parallel job is a job that uses more than one CPU.
  • Because the cluster is a shared resource, it is the GE (the Grid Engine, or the job scheduler) that allocates CPUs, known as slots in GE jargon, to jobs, hence a parallel job must request a set of slots (i.e. CPUs) when it is submitted.

  • A parallel job request a parallel environment and a number of slots (CPUs) using the -pe <name> N specification to qsub:

    • the <name> is either mpich, orte or mthread (see below);

    • the value of N is the number of requested slots (CPUs) and can be specified a N-M, where N is the minimum and M the maximum number of slots requested;
    • that option can be an embedded directive.

    • The job script access the value of assigned slots via an environment variable (NSLOTS) and, for MPI jobs, gets the list of computed nodes via a so-called machines file.

  • There are two types of parallel jobs:
  1. MPI or distributed jobs: the CPUs can be distributed over multiple compute nodes.
    PROs: there is conceptually no limit on how many CPUs can be used, the cumulative amount of CPUs and memory a job can use can get quite large. The GE can find (a lot of) unused CPUs on a busy machine by finding them on different nodes
    CONs: each CPU should assume to be on a separate compute node and thus must communicate with the other CPUs to exchange information (aka message passing), Programming can get more complicated and the inter-process communication can become a bottleneck.

  2. Multi-threaded jobs: all the CPUs must be on the same compute node.
    PROs: all CPUs can share a common memory space, inter-process communication can be very efficient (being local) and programming is often simpler;
    CONs: you can only use as many CPUs as there is on the largest compute node, and you can get them only if they are not in use by someone else.

  • How do I know which type of parallel job to submit to?
    The author of the software will in most cases specify if the application can be parallelized and how
    • Some analysis are parallelized by submitting a slew of independent serial jobs, where using a job array may be the best approach;
    • some analysis use explicit message passing (MPI); while
    • some analysis use a programming model that can use multiple threads.  

 

  • We currently do not support hybrid parallel jobs: jobs that would use both multi-threaded and distributed models.

2. MPI, or Distributed Parallel Jobs with Explicit Message Passing

  • A MPI job runs code that uses an explicit message passing programming scheme known as MPI.
  • There are two distinct implementations of the MPI protocol:
    • MPICH and
    • ORTE;
  • OpenMPI is an ORTE implementation of MPI;
  • MVAPICH is a MPICH implementation, using explicitly the InfiniBand as transport fabric (faster message passing).
  • OpenMPI is not OpenMP
    • OpenMPI is the ORTE implementation of MPI;
    • OpenMP is an API for multi-platform shared-memory parallel programming.

 

The follow grid of modules, corresponding to a combination of compiler & implementation, is available on Hydra:

 ORTEMPICHMVAPICH
GNUgcc/openmpigcc/mpichgcc/mvapich
GNU gcc 4.4.7gcc/4.4/openmpigcc/4.4/mpichgcc/4.4/mvapich
GNU gcc 4.9.1gcc/4.9/openmpigcc/4.9/mpichgcc/4.9/mvapich
GNU gcc 4.9.2gcc/4.9/openmpi-4.9.2n/an/a
    
Intelintel/mpin/an/a
Intel v15.xintel/15/mpin/an/a
Intel v16.xintel/16/mpin/an/a
    
PGIpgi/openmpipgi/mpichpgi/mvapich
PGI v14.xpgi/14/openmpipgi/14/mpichpgi/14/mvapich
PGI v15.xpgi/15/openmpipgi/15/mpichpgi/15/mvapich
PGI v15.9pgi/15/openmpi-15.9pgi/15/mpich-15.9pgi/15/mvapich-15.9

In fact, there are more version specific module available, check with

   % ( module -t avail ) | & grep pi

for a complete list, then use

   % module whatis <module-name>

or

   % module help <module-name>

where <module-name> is one of the listed module to get more specific information.

2.a ORTE or OpenMPI

The following example shows how to write an ORTE/OpenMPI job script:

Example of a ORTE/OpenMPI job script, using Bourne shell syntax
# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N hello -o hello.log
#$ -pe orte 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS distributed over:
cat $PE_HOSTFILE
#
# load gcc's compiler and openmpi
module load gcc/4.9/openmpi-4.9.2
#
# run the program hello
mpirun -np $NSLOTS ./hello
#
echo = `date` job $JOB_NAME done

This example will

  • show the content of the machine (i.e. the distribution of compute nodes)
  • load the OpenMPI module for gcc 4.9.2, and
  • run the program hello,
  • requesting 72 slots (CPUs).

It assumes that the program hello was build using gcc 4.9.2

2.b MPICH or MVAPICH

The following example shows how to write a MPICH/MVAPICH job script:

Example of a MVAPICH job script, using C-shell syntax
# /bin/csh
#
#$ -cwd -j y - N hello -o hello.log
#$ -pe mpich 72
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo using $NSLOTS slots on:
sort $TMPDIR/machines | uniq -c
#
# load PGI's mvapich
module load pgi/mvapich
#
# run the program hello
mpirun -np $NSLOTS -machinefile $TMPDIR/machines ./hello
#
echo = `date` job $JOB_NAME done

This example will

  • show the content of the machine (i.e. the distribution of compute nodes) using sort & uniq to produce a compact format "hostname nCpu"
  • load the MVAPICH module for the PGI compiler, and
  • run the program hello,
  • requesting 72 slots (CPUs).

It assumes that the program hello was build using the PGI compiler and the MVAPICH library/module (to enable IB the transport fabric).

3. Multi-threaded, or OpenMP, Parallel Jobs

A multi-threaded job is a job that will make use of more than one CPU but needs all the CPUs to be on the same compute node.

3.a Multi-threaded job

The following example shows how to write a multi-threaded script:

Example of a Multi-threaded job script, using teh Bourne shell syntax
# /bin/sh
#
#$ -S /bin/sh
#$ -cwd -j y -N demo -o demo.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the demo (fictional) module
module load tools/demo
#
# convert the generic parameter file, gen-params, to a specific file
# where the number of thread are inserted where the string MTHREADS is found
sed "s/MTHREADS/$NSLOTS/" gen-params > all-params
#
# run the demo program specifying all-params as the parameter file
demo -p all-params
#
echo = `date` job $JOB_NAME done

This example will run a fictional demo using 32 CPUs (slots). The script

  • loads the fictional tools/demo module,
  • parses the file gen-params and replace every occurrence of the string MTHREAD by the allocated number of slots (via $NSLOTS), using the stream editor (man sed),
  • saves the result to a file called all-params,
  • runs the program demo and specify to use the parameter file all-params.

3.b OpenMP job

The following example shows how to write an OpenMP job script:

Example of a OpenMP job script, using C-shell syntax
# /bin/csh
#
#$ -cwd -j y -N hellomp -o hellomp.log
#$ -pe mthread 32
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
# load the pgi module
module load pgi
#
# set the variable OMP_NUM_THREADS to the content of NSLOTS
# this tell OpenMP applications how many threads/slots/CPUs to use
setenv OMP_NUM_THREADS $NSLOTS 
#
# run the hellomp OpenMP program, build w/ PGI
./hellomp
#
echo = `date` job $JOB_NAME done

This example will run the program hellomp, that was compiled with the PGI compiler, using 32 threads (CPUs/slots). The script

  • loads the pgi  module,
  • sets OMP_NUM_THREADS to the content of NSLOTS to specify the number of threads
  • runs the program hellomp.

3. Examples

You can find examples of simple/trivial test cases with source code, Makefile, job file and log file under /home/hpc/examples

 

  • No labels