Job Arrays

A jobs array is specified by adding a task range to qsub via the -t flag:

% qsub -t 1-100 model.job

Your job-array NNNNNN.1-100:1 ("model.job") has been submitted

The scheduler will start 100 jobs, each starting the job script file model.job, and pass to each job a task identifier (a number between 1 and 100) via an environment variable.

The syntax for the -t flag is -t n[-m[:s]], namely:

`-t 1-20`	run 20 tasks, with task IDs ranging from1 to 20
`-t 10-30`	run 21 tasks, with task IDs ranging from 10 to 30
`-t 50-140:10`	run 10 tasks, with task IDs ranging from 50 to 140 by step of 10 (50, 60, ..., 140)
`-t 20`	run 1 task, with task ID 20

Each instantiation of the job will have access to the following four environment variables:

`SGE_TASK_ID`	unique ID for the specific task
`SGE_TASK_FIRST`	ID of the first task, as specified with `qsub`
`SGE_TASK_LAST`	ID of the last task, as specified with `qsub`
`SGE_TASK_STEPSIZE`	task ID step size, as specified with `qsub`

You can also limit the number of concurrent tasks with the -tc flag, for example:

% qsub -t 1-1000 -tc 100 model.job

will request to run 1,000 jobs, but no more than 100 running at the same time.

Example of a Job Script

The follow example shows how to submit a job array, using embedded directives:

Example of job script to submit a job array

# /bin/csh
#
#$ -N model-1k -cwd -j y -o model-$TASK_ID.log
#$ -t 1-1000 -tc 100
#
echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with jobID=$JOB_ID and taskID=$SGE_TASK_ID
#
set TID = $SGE_TASK_ID
./model -id $TID
#
echo = `date` $JOB_NAME for taskID=$SGE_TASK_ID done.

This example request to run 1,000 models, using a task ID ranging from 1 to 1000, but limited to 100 running at the same time.
It assumes that the model computation is run with the command ./model -id N, where N is a number between 1 and 1,000.
This example also show how to use the pseudo variable TASK_ID (not SGE_TASK_ID, yest I agree this is confusing) to give to the log file of each task with a different name: the output of task 123 will be model-123.log in the current working directory.

Examples of How to Convert a Task ID to a More Useful Set of Parameters

In most cases, when starting a computation you will need to convert a simple task identifier to a slew of parameters.

You can put into you code that conversion, but you may not want to do it, or can't do it because you are using a an existing tool

Here are a few suggestions (Un*x tricks, using C-shell syntax) on how to do it:

You can use a separate input file for each job (task):
if your code read from stdin (standard input) you can do something like this:
```
@ i = $SGE_TASK_ID
./domodel < input.$i
```
You just need to prepare as many input.NNN files as cases you want to run, from input.1 to input.500 for example.
If you prefer to call them input.001 to input.500, you can use awk to reformat $i as follow:
```
@ i = $SGE_TASK_ID
set I = `echo $i | awk '{printf "%3.3d", $1}'`
./domodel < input.$I
```
where "%3.3d" is the trick to convert the integer $i into a 3 character string $I with leading zeros if needed.
The Bourne shell (sh or bash) equivalent is:
```
i=$SGE_TASK_ID
I=`echo $i | awk '{printf "%3.3d", $1}'`
./domodel < input.$I
```
You can use a single text file that list a slew of parameters and extract one line, using the command awk:
```
@ i = $SGE_TASK_ID
set P = (`awk "NR==$i" parameters-list.txt`)
./compute $P
```
This example will extra a line from the file parameters-list.txt withe the line number stored in the variable $i and set the variable $P to hold that one line.
You just have to create such a file with for example 500 lines to run 500 cases of compute followed the parameters listed in each line.
The Bourne shell (sh or bash) equivalent is:
```
i=$SGE_TASK_ID
P=`awk "NR==$i" parameters-list.txt`
./compute $P
```
You can write a tool (as small program or script, that I call here mytool) that does the conversion. You just run it to get the parameters:
```
@ i = $SGE_TASK_ID
set P = (`./mytool $i`)
```
or for Bourne shell (sh or bash) equivalent aficionados:
```
i=$SGE_TASK_ID
P=`./mytool $i`
```

Example on How to Consolidate Small Jobs in Hewer Larger Jobs When Using Job Arrays

There is some overhead each time the GE starts a a job (or task). So if you need to compute let's say 5,000 similar tasks, each taking 3 minutes, it may be convenient to submit a 5000 tasks job array, but it will be inefficient: the system will spend 25 to 50% of its time starting and keeping track of a slew of small jobs. The following script illustrates a simple trick to consolidate such computations:

Examlpe of job array consolidation wrapper script, domodel.job

# /bin/csh
#
# simple wrapper to consolidate using the step size
#
#$ -N model-1k20 -cwd -j y -o model-$TASK_ID-by-20.log
#$ -t 1-1000:20
#
echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with jobID=$JOB_ID
#
@ iFr = $SGE_TASK_ID
@ iTo = $iFr + $SGE_TASK_STEPSIZE - 1
if ($iTo > $SGE_TASK_LAST) @ iTo = $SGE_TASK_LAST
#
echo running model.csh for taskIDs $iFr to $iTo
@ i = $iFr
while ($i <= $iTo)
  ./model.csh $i >& model-$i.log
  @ i++
end
#
echo = `date` $JOB_NAME for taskIDs $iFr to $iTo done.

This wrapper, that I call domodel.job, will run 20 models, via the csh script model.csh, so instead of running 1,000 three minute long jobs, it will run 50 one hour long jobs.

The script model.csh is simply:

Script model.csh called by domodel.job

#!/bin/csh
#
set TID = $1
echo + `date` model.csh started for taskID=$TID
#
./model -id $TID
#
echo = `date` model.csh for taskID=$TID done.

But can be as complex as you may need it to be.

The file model.csh must be authorized to be executed with the command

% chmod +x model.csh

You can use a bash script, or any other valid Linux command in place of the line ./model.csh.

Last updated 08 Jan 2016 SGK.

Page tree

Job Arrays

Example of a Job Script

Examples of How to Convert a Task ID to a More Useful Set of Parameters

Example on How to Consolidate Small Jobs in Hewer Larger Jobs When Using Job Arrays