A jobs array is specified by adding a task range to qsub
via the -t
flag:
% qsub -t 1-100 model.job
Your job-array NNNNNN.1-100:1 ("model.job") has been submitted
The scheduler will start 100 jobs, each starting the job script file model.job
, and pass to each job a task identifier (a number between 1 and 100) via an environment variable.
The syntax for the -t
flag is -t n[-m[:s]]
, namely:
-t 1-20 | run 20 tasks, with task IDs ranging from1 to 20 |
-t 10-30 | run 21 tasks, with task IDs ranging from 10 to 30 |
-t 50-140:10 | run 10 tasks, with task IDs ranging from 50 to 140 by step of 10 (50, 60, ..., 140) |
-t 20 | run 1 task, with task ID 20 |
Each instantiation of the job will have access to the following four environment variables:
SGE_TASK_ID | unique ID for the specific task |
SGE_TASK_FIRST | ID of the first task, as specified with qsub |
SGE_TASK_LAST | ID of the last task, as specified with qsub |
SGE_TASK_STEPSIZE | task ID step size, as specified with qsub |
You can also limit the number of concurrent tasks with the -tc
flag, for example:
% qsub -t 1-1000 -tc 100 model.job
will request to run 1,000 jobs, but no more than 100 running at the same time.
Example of a Job Script
The follow example shows how to submit a job array, using embedded directives:
# /bin/csh # #$ -N model-1k -cwd -j y -o model-$TASK_ID.log #$ -t 1-1000 -tc 100 # echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with jobID=$JOB_ID and taskID=$SGE_TASK_ID # set TID = $SGE_TASK_ID ./model -id $TID # echo = `date` $JOB_NAME for taskID=$SGE_TASK_ID done.
- This example request to run 1,000 models, using a task ID ranging from 1 to 1000, but limited to 100 running at the same time.
- It assumes that the model computation is run with the command ./model -id N, where N is a number between 1 and 1,000.
- This example also show how to use the pseudo variable TASK_ID (not SGE_TASK_ID, yest I agree this is confusing) to give to the log file of each task with a different name: the output of task 123 will be model-123.log in the current working directory.
Examples of How to Convert a Task ID to a More Useful Set of Parameters
Example on How to Consolidate Small Jobs in Hewer Larger Jobs When Using Job Arrays
There is some overhead each time the GE starts a a job (or task). So if you need to compute let's say 5,000 similar tasks, each taking 3 minutes, it may be convenient to submit a 5000 tasks job array, but it will be inefficient: the system will spend 25 to 50% of its time starting and keeping track of a slew of small jobs. The following script illustrates a simple trick to consolidate such computations:
# /bin/csh # # simple wrapper to consolidate using the step size # #$ -N model-1k20 -cwd -j y -o model-$TASK_ID-by-20.log #$ -t 1-1000:20 # echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with jobID=$JOB_ID # @ i = $SGE_TASK_ID @ iTo = $i + $SGE_TASK_STEPSIZE - 1 if ($iTo > $SGE_TASK_LAST) @ iTo = $SGE_TASK_LAST # echo processing model.csh for taskIDs $i to $iTo while ($i <= $iTo) ./model.csh $i >& model-$i.log @ i++ end # echo = `date` $JOB_NAME for taskIDs $SGE_TASK_ID to $iTo done.
This wrapper, that I call domodel.job
, will run 20 models, via the csh
script model.csh
, so instead of running 1,000 three minute long jobs, it will run 50 one hour long jobs.
The script model.csh is simply:
#!/bin/csh # set TID = $1 echo + `date` model.csh started for taskID=$TID # ./model -id $TID # echo = `date` model.csh for taskID=$TID done.
The file model.csh
must be authorized to be executed with the command
% chmod +x model.csh
You can use a bash script, or any other valid Linux command in place of the line ./model.csh
.
Last updated SGK.