- Example of a Job Script
- How to Consolidate Small Jobs into Fewer Larger Jobs Using Job Arrays
- Rules for Submitting Job Arrays that use Parallel Environments (like MPI)
The syntax for the
-t flag is
-t n[-m[:s]], namely:
|run 20 tasks, with task IDs ranging from1 to 20|
|run 21 tasks, with task IDs ranging from 10 to 30|
|run 10 tasks, with task IDs ranging from 50 to 140 by step of 10 (50, 60, ..., 140)|
|run one task, with task ID 20|
Each instantiation of the job will have access to the following four environment variables:
|unique ID for the specific task|
|ID of the first task, as specified with |
|ID of the last task, as specified with |
|task ID step size, as specified with |
You can also limit the number of concurrent tasks with the
-tc flag, for example:
- By default, SGE makes a local copy of each job script on the compute nodes it runs on.
- Parallel job arrays should avoid this to prevent a race condition, where for a small fraction of the tasks the scheduler starts the script before it is copied, hence some tasks fails to start.
- The output of
qstat -j 9616234will show something like this:and the SGE reporting file will list:
error reason 11: 03/24/2016 11:09:11 [10464:63260]: unable to find job file "/opt/gridengine/default/spool/compute-2-2/job_scripts/94399389416234"
job never ran -> schedule it again
- The output of
qstat -f -explain E | grep QERRORwill show something like this:
queue mThC.q marked QERROR as result of job 9616234's failure at host compute-1-2.local
Leaving leaving a queue entry in Error state.
Do not use embedded directive (sigh).
- Write a script (
python, etc) with the needed required steps, as for a job script.
- Make that script executable (
chmod +x), you can use the
#!mechanism to specify the interpreter (aka shebang).
- Write a file with the
qsubcommand and all the options that you would otherwise put as embedded directives.
- Pass the
-b yoption to
qsuband specify the full path of the script to execute.
- Source that file to submit the parallel job array.
- Do not modify the executable script file while the job array is running.
The following job script with embedded directives must be broken into two files:
|one job script with embedded directives||is replaced by two files, a qsub_XXX.sou and a XXX.sh|
no spaces after the '\'
This can be any type of executable script
, but if you use:
Before submitting the job array, make sure the script is executable:
You can edit the
qsub_demo.sou to submit more tasks, but do not modify the executable script file while the job array is running.
Last updated SGK SGK.