Job Arrays

A jobs array is specified by adding a task range to qsub via the -t flag:

% qsub -t 1-100 model.job

Your job-array NNNNNN.1-100:1 ("model.job") has been submitted

The scheduler will start 100 jobs, each starting the job script file model.job, and pass to each job a task identifier (a number between 1 and 100) via an environment variable.

The syntax for the -t flag is -t n[-m[:s]], namely:

`-t 1-20`	run 20 tasks, with task IDs ranging from1 to 20
`-t 10-30`	run 21 tasks, with task IDs ranging from 10 to 30
`-t 50-140:10`	run 10 tasks, with task IDs ranging from 50 to 140 by step of 10 (50, 60, ..., 140)
`-t 20`	run 1 task, with task ID 20

Each instantiation of the job will have access to the following four environment variables:

`SGE_TASK_ID`	unique ID for the specific task
`SGE_TASK_FIRST`	ID of the first task, as specified with `qsub`
`SGE_TASK_LAST`	ID of the last task, as specified with `qsub`
`SGE_TASK_STEPSIZE`	task ID step size, as specified with `qsub`

You can also limit the number of concurrent tasks with the -tc flag, for example:

% qsub -t 1-1000 -tc 100 model.job

will request to run 1,000 jobs, but no more than 100 running at the same time.

Example of a Job Script

The follow example shows how to submit a job array, using embedded directives:

Example of job script to submit a job array

# /bin/csh
#
#$ -N model-1k -cwd -j y -o model-$TASK_ID.log
#$ -t 1-1000 -tc 100
#
echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with jobID=$JOB_ID and taskID=$SGE_TASK_ID
#
set TID = $SGE_TASK_ID
./model -id $TID
#
echo = `date` $JOB_NAME for taskID=$SGE_TASK_ID done.

This example request to run 1,000 models, using a task ID ranging from 1 to 1000, but limited to 100 running at the same time.
It assumes that the model computation is run with the command ./model -id N, where N is a number between 1 and 1,000.
This example also show how to use the pseudo variable TASK_ID (not SGE_TASK_ID, yest I agree this is confusing) to give to the log file of each task with a different name: the output of task 123 will be model-123.log in the current working directory.

Examples of How to Convert a Task ID to a More Useful Set of Parameters

Example on How to Consolidate Small Jobs in Hewer Larger Jobs When Using Job Arrays

There is some overhead each time the GE starts a a job (or task). So if you need to compute let's say 5,000 similar tasks, each taking 3 minutes, it may be convenient to submit a 5000 tasks job array, but it will be inefficient: the system will spend 25 to 50% of its time starting and keeping track of a slew of small jobs. The following script illustrates a simple trick to consolidate such computations:

Examlpe of job array consolidation wrapper script

# /bin/csh
#
[ add embedded directives here ]
#
# simple wrapper to consolidate using the step size
#
@ i = $SGE_TASK_ID
@ last = $i + $SGE_TASK_STEPSIZE - 1
if ($last > $SGE_TASK_LAST) @ last = $SGE_TASK_LAST
#
echo processing execute.sh from $i to $last
while ($i <= $last)
  ./execute.sh $i
  @ i++
end

Last updated 08 Jan 2016 SGK.

Page tree

Job Arrays

Example of a Job Script

Examples of How to Convert a Task ID to a More Useful Set of Parameters

Example on How to Consolidate Small Jobs in Hewer Larger Jobs When Using Job Arrays