A jobs array is specified by adding a task range to qsub
via the -t
flag:
% qsub -t 1-100 model.job
Your job-array NNNNNN.1-100:1 ("model.job") has been submitted
The scheduler will start 100 jobs, each starting the job script file model.job
, and pass to each job a task identifier (a number between 1 and 100) via an environment variable.
The syntax for the -t
flag is -t n[-m[:s]]
, namely:
-t 1-20 | run 20 tasks, with task IDs ranging from1 to 20 |
-t 10-30 | run 21 tasks, with task IDs ranging from 10 to 30 |
-t 50-140:10 | run 10 tasks, with task IDs ranging from 50 to 140 by step of 10 (50, 60, ..., 140) |
-t 20 | run 1 task, with task ID 20 |
Each instantiation of the job will have access to the following four environment variables:
SGE_TASK_ID | unique ID for the specific task |
SGE_TASK_FIRST | ID of the first task, as specified with qsub |
SGE_TASK_LAST | ID of the last task, as specified with qsub |
SGE_TASK_STEPSIZE | task ID step size, as specified with qsub |
You can also limit the number of concurrent tasks with the -tc
flag, for example:
% qsub -t 1-1000 -tc 100 model.job
will request to run 1,000 jobs, but no more than 100 running at the same time.
Example of a Job Script
The follow example shows how to submit a job array, using embedded directives:
# /bin/csh # #$ -N model-1k -cwd -j y -o model-$TASK_ID.log #$ -t 1-1000 -tc 100 # echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with jobID=$JOB_ID and taskID=$SGE_TASK_ID # set TID = $SGE_TASK_ID ./model -id $TID # echo = `date` $JOB_NAME for taskID=$SGE_TASK_ID done.
- This example request to run 1,000 models, using a task ID ranging from 1 to 1000, but limited to 100 running at the same time.
- It assumes that the model computation is run with the command ./model -id N, where N is a number between 1 and 1,000.
- This example also show how to use the pseudo variable TASK_ID (not SGE_TASK_ID, yest I agree this is confusing) to give to the log file of each task with a different name: the output of task 123 will be model-123.log in the current working directory.
Examples of How to Convert a Task ID to a More Useful Set of Parameters
Example on How to Consolidate Small Jobs in Hewer Larger Jobs When Using Job Arrays
There is some overhead each time the GE starts a a job (or task). So if you need to compute let's say 5,000 similar tasks, each taking 3 minutes, it may be convenient to submit a 5000 tasks job array, but it will be inefficient: the system will spend 25 to 50% of its time starting and keeping track of a slew of small jobs. The following script illustrates a simple trick to consolidate such computations:
# /bin/csh # [ add embedded directives here ] # # simple wrapper to consolidate using the step size # @ i = $SGE_TASK_ID @ last = $i + $SGE_TASK_STEPSIZE - 1 if ($last > $SGE_TASK_LAST) @ last = $SGE_TASK_LAST # echo processing execute.sh from $i to $last while ($i <= $last) ./execute.sh $i @ i++ end
Last updated SGK.