Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  1. Introduction
  2. Example of a Job Script
  3. How to Convert a Task ID to a More Useful Set of Parameters

  4. How to Consolidate Small Jobs into Fewer Larger Jobs Using Job Arrays
  5. Rules for Submitting Job Arrays that use Parallel Environments (like MPI) 

Anchor
Introduction
Introduction
1. Introduction

...

The syntax for the -t flag is -t n[-m[:s]], namely:

-t 1-20run 20 tasks, with task IDs  ranging from1  to 20
-t 10-30run 21 tasks, with task IDs ranging from 10  to 30
-t 50-140:10run 10 tasks, with task IDs ranging from 50  to 140 by step of 10 (50, 60, ..., 140)
-t 20run one task, with task ID 20


Each instantiation of the job will have access to the following four environment variables:

SGE_TASK_IDunique ID for the specific task
SGE_TASK_FIRSTID of the first task, as specified with qsub
SGE_TASK_LASTID of the last task, as specified with qsub
SGE_TASK_STEPSIZEtask ID step size, as specified with qsub


You can also limit the number of concurrent tasks with the -tc flag, for example:

...

  • By default, SGE makes a local copy of each job script on the compute nodes it runs on.
  • Parallel job arrays should avoid this to prevent a race condition, where for a small fraction of the tasks the scheduler starts the script before it is copied, hence some tasks fails to start.

  • The output of qstat -j 9616234 will show something like this:
    error reason  11:          03/24/2016 11:09:11 [10464:63260]: unable to find job file "/opt/gridengine/default/spool/compute-2-2/job_scripts/94399389416234"
    and the SGE reporting file will list:
    job never ran -> schedule it again
  • The output of qstat -f -explain E | grep QERROR  will show something like this:
    queue mThC.q marked QERROR as result of job 9616234's failure at host compute-1-2.local 
    Leaving leaving a queue entry in Error state.

...

  1. Do not use embedded directive (sigh(sad)).

  2. Write a script (sh or ,csh, or... perl, python, etc) with the needed required steps, as for a job script.
  3. Make that script executable (chmod +x), you can use the #! mechanism to specify the interpreter (aka shebang).
  4. Write a file with the qsub command and all the options that you would otherwise put as embedded directives.
  5. Pass the -b y option to qsub and specify the full path of the script to execute.
  6. Source that file to submit the parallel job array.
  7. (warning) Do not modify the executable script file while the job array is running.

...

The following job script with embedded directives must be broken into two files:

one job script with embedded directivesis replaced by two files, a qsub_XXX.sou and a XXX.sh


Code Block
languagebash
titledemo.job
#--------
#$ -q mThC.q
#$ -pe orte 20
#$ -l mres=4G,h_data=4G,h_vmem=4G
#$ -cwd -y j -N demo -o demo.$TASK_ID.log
#$ -t 1-100
#-------
module load some/thing
#-------
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + taskID=$SGE_TASK_ID
echo + NSLOTS=$NSLOTS distributed over:
cat $PE_HOSTFILE
#
mpirun -np $NSLOTS crunch -i input.$SGE_TASK_ID -o output.$SGE_TASK_ID
#
echo = `date` job $JOB_NAME done



Code Block
languagebash
titleqsub_demo.sou
qsub \
 -q mThC.q \
 -pe orte 20 \
 -l mres=4G,h_data=4G,h_vmem=4G \
 -cwd -y j -N demo -o 'demo.$TASK_ID.log' \
 -t 1-100 \
 -b y $PWD/demo.sh

(warning) no spaces after the '\'


Code Block
languagebash
titledemo.sh
#!/bin/sh
# any embedded directives here will be ignored
#-------
source /etc/profile.d/modules.sh
module load some/thing
#-------
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + taskID=$SGE_TASK_ID
echo + NSLOTS=$NSLOTS distributed over:
cat $PE_HOSTFILE
#
mpirun -np $NSLOTS crunch -i input.$SGE_TASK_ID -o output.$SGE_TASK_ID
#
echo = `date` job $JOB_NAME done

(lightbulb)

this

This can be any type of executable script

...

, but if you use:

  • #!/bin/sh or #!/bin/bash, you'll need source /etc/profile.d/modules.sh" to access modules,
  • #!/bin/sh -l or #!/bin/bash -l, module will be defined, but the script will read ~/.profile,
  • #!/bin/csh or #!/bin/tcsh, module will be defined,
  • #!/bin/csh -f or #!/bin/tcsh -f, you'll need source /etc/profile.d/modules.csh to access modules.
    (wink) Don't you love Linux? (It's all in the man pages, tho).

Before submitting the job array, make sure the script is executable:

...

You can edit the qsub_demo.sou to submit more tasks, but do not modify the executable script file while the job array is running.

 

...

Last updated SGK  SGK.