- Introduction
- Conceptual Examples
- Serial Jobs
- Job Arrays
- Parallel Jobs
- MPI Jobs
- Multi-Threaded Jobs
- Available Queues
- High-CPUs Queues
- High-Memory Queues
- Very-High Memory Queues
- Other Queues (interactive, access to SSD, ...)
- Resource Limits
- Examples
- Help me Choose a Queue and Write a Job Script
- QSubGen: a Job Script Generator
- What Queue Should I Use?
- Why Can't I Queue that Job?
- Why Is my Job Queued but not Running?
1. Introduction
- Jobs are submitted from either login node to the job scheduler (the Grid Engine or GE), using the command
qsub
and a job file; - submitted jobs may wait in the queue until the requested resource(s) is/are available, or, if a user has reach a resource usage limit, until that limit has cleared.
- The scheduler will eventually run each job, starting it on one or several compute nodes, and the job will run in batch, not interactive, mode;
- it is the scheduler that selects on which compute(s) node to run a job on, and
- if the job exceeds a limit, like it uses too much memory, or consumes too much CPU time, the scheduler will kill the job.
- To run a computation users must write a list of instructions, aka a job script, that specifies the step needed to perform the said computation; and
- pass directives to the scheduler as to which resources are required (like amount of memory, CPU time, number of CPUs, etc).
- The job script can also contains these directives, aka embedded directives.
A job is submitted with the command
qsub
, with options (or embedded directives) and followed by name of the file containing the job script.- A few compute nodes, currently 2, are set aside for interactive use, see section below on the interactive queue.
- Do not use the login nodes to run any substantive computation - the login nodes are monitored and a task running on one of the the login node that consumes too much resources will have at first its priority reduced, and eventually terminated.
2. Conceptual Examples
The Conceptual Examples page explains how to submit jobs and how to write simple job scripts.
The page presents:
- Trivial Example
- Better Example
- Example with Embedded Directive
- Example with Embedded Directive and Arguments
- Note
- Miscellaneous
- Environment Variables
- Catching Time Limits
3. Serial Jobs
- A serial job is a job that uses only one CPU.
- It is either started by using a dedicated job script file (i.e. a different one for each job you need to run),
or a more sophisticated job script that takes one or several arguments. - Refer to the conceptual examples to learn how to submit jobs.
4. Job Arrays
- Conceptually a job array can be described something like this:
"run my computation for 100 cases, identified as task number ranging from 1 to 100 by step of 1"; hence - a job array is a set of computations, known as tasks, that can run using a unique job script file and a single number that identifies each task to be performed.
- The job script must thus be written to start a specific task using a simple integer as identifier,
there are plenty of ways to convert a unique number into a specific (slew of) parameters. - A job array doesn't need to be a serial job, and the job script can take arguments.
So instead of queuing, let's say 100 jobs (like 100 bootstraps, 100 light curves to analyze, 100 models to build), one at a time,
you submit one job and request to have the GE run it 100 times, or for 100 tasks.- In fact, you can specify:
- the starting task number,
- the ending task number,
- the task number increment, and, if needed,
- the maximum number of tasks that should run concurrently.
The Job Arrays page explains how to submit job arrays.- It also shows some tricks to convert a task identifier into a (slew of) parameters, and
- how to consolidate a large number of small jobs into fewer larger ones.
5. Parallel Jobs
- A parallel job is a job that uses more than one CPU.
Because the cluster is a shared resource, it is the GE that allocates CPUs, known as slots in GE jargon, hence a parallel job must request a set of slots when it is submitted;
- this is accomplished by specifying
-pe <pe-name> N
toqsub
, where<pe-name>
is the name of the parallel environment (PE); andN
the number of requested slots.
- The choice of a parallel environment will determine how the GE will allocate CPUs and how the job gets started.
The Parallel Jobs page explains how to submit parallel jobs and describes what parallel environment are available.
6. Available Queues
Every job running on the cluster is started in a queue.
- The GE will select a queue based on the resources requested and the usage in each queue.
- If you don't specify the right queues or the right resource(s), your job will either
- not get queued,
- wait forever and never run, or
- start and get killed when it exceeds one of the limit of the queue it was started in.
The set of available queues is a matrix of queues:
- Four sets of queues: a high-CPU and a high-memory set of queues, complemented by a very-high-memory restricted queue and special restricted queue.
The high-CPU and a high-memory sets of queues have different time limits: short, medium, long and unlimited.
Type Description high-CPU for serial or parallel jobs that do not need a lot of memory, high-memory for serial or multi-threaded parallel jobs that require a lot of memory, very-high-memory reserved for jobs that need a very large amount of memory, other for interactive use or projects that need special resources (SSD, ...). The Available Queues page describe in details the available queues.
7. Resource Limits
While each queue has a set of limits (CPU, memory), the cluster also has some global limits.
What are the Resource Limits
There are limits on
- how many jobs can be queued simultaneously:
- there can't be more that 25,000 jobs queued at any time,
- a single user can't queue more than 2,500 jobs, and
- a job array can't request more that 10,000 tasks.
- how many jobs can run simultaneously, in particular there is:
- a limit on how many slots a single user can use (name=u_slots value=600)
- a limit on how many slots a user can grab in each queue, with fewer slots allowed in queues with longer time limits.
- and how much memory can be simultaneously reserved, in particular
- a limit on how much memory can be reserved by a single user in each queue.
For example, users can't grab more than 51 slots (or CPUs) and 1.7TB of reserved memory concurrently for jobs running in the long-time high-memory queue (lThM.q
).
The actual limits are subject to change depending on the cluster usage and the hardware configuration.
The more resources a job uses (more CPU time, more memory), the fewer similar jobs a single user can run concurrently,
in other words you can run a lot of small jobs at the same time but fewer very big/long jobs.
How to Check the Resource Limits
To check the global limits:
% qconf -sconf global | grep max
and the explanation of these parameters can be found in
% man 5 sge_conf
To check the queue specific resource limits, use
% qconf -srqs
Note that these values get adjusted as needed.
The explanation of the resource quota set (rqs) can be found in
% man 5 sge_resource_quota
To check how much of these resources (queues quota) are used overall, or by your job(s), use:
% qquota
or
% qquota -u $USER
You can also inquire about a specific resource (qquota -l mem_res
), and use the local tools (module load tools/local
) qquota+
to
- get a nicer printout of the reserved memory,
- get the % of usage with respect to its limit
like in
% qquota+ +% -l slots -u hpc
(more info via qquota+ -help
or man qquota+
.)
To check the limits of a specific queue (CPU and memory), use
% qconf -sq sThC.q
and the explanation of these parameters can be found in
% man 5 queue_conf
under the RESOURCE LIMITS
heading.
NOTES
- You can submit a job and tell the GE to let it start only after another job has completed, using
-hold_jid <jobid>
flag toqsub
:% qsub -N FirstOne pre-process.job Your job 12345678 ("FirstOne") has been submitted % qsub -hold_jid 12345678 -N SecondOne post-process.job Your job 12345679 ("SecondOne") has been submitted
- You can be more sophisticated (or use
qchain
see below):
#!/bin/csh # set parameter = $1 set name = $2 # set jid1 = `qsub -terse -N "pre-process-$name" pre-process.job $parameter` echo $jid1 submitted '("'pre-process-$name'")' set jid2 = `qsub -terse -hold_jid $jid1 -N "process-$name" process.job $parameter` echo $jid2 submitted '("'pre-process-$name'")' set jid3 = `qsub -terse -hold_jid $jid2 -N "post-process-$name" post-process.job $parameter` echo $jid3 submitted '("'post-process-$name'")'
This example will submit 3 jobs: pre-process.job
, process.job
and post-process.job
to be run sequentially,
each takes one argument, the parameter,
- and is given a compounded name.
- The embedded directives in the three job scripts may request different resources, like
- lots of memory for pre-processing,
- lots of CPUs for processing, and
- neither for post processing.
You can use the
qchain
tool by loading thetools/local
module, to submit jobs that must run sequentially.module load tools/local qchain *.job
will submit the job files that match "
*.job
" in the order given by "echo *.job
".By using quotes, as follows:
module load tools/local qchain '-N start first.job 123' '-N crunch second.job 123' '-N post-process finish.job 123'
qchain
allows you to pass arguments to bothqsub
and the job scripts.
You can limit how many jobs you submit with the following trick:
How to limit the number of jobs submitted, using C-shell syntax# define how many jobs to queue @ NMAX = 250 # loop: @ N = `qstat -u $USER | tail --lines=+3 | wc -l` if ($N >= $NMAX) then sleep 180 goto loop endif #
This example counts how many jobs you have in the queue (running and waiting) using
the command qstat
(andtail
andwc -l
) and pauses for 3 minutes (180 seconds) if that count is 250 or higher.You would include these lines in a script that submits a slew of jobs, but should not queue more than a given number at any time (to count only the queued jobs, add
-s p
toqsub
).Or you can use the toolq-wait
(needs the moduletools/local
), that takes an argument and two options:% q-wait blah
will pause until you have no job whose name has the string 'blah
' left queued or running.The options allow you to specify the number of jobs, and how often to check, i.e.:% q-wait -N 125 -wait 3600 crunch
will pause until there are 250 or fewer jobs whose name has the string 'crunch
' left queued or running, checking once an hour.- Avoid using the
-V
flag toqsub
- The
-V
flag passes all the active environment variables to the script. - While it may be convenient in some instances, it creates a dependency on the precise environment configuration when submitting the job,
thus the same job script may fail when it is submitted at a later time (or from a different log in) from a different configuration.
- The
8. Examples
You can find examples of simple/trivial test cases with source code, Makefile
, job script files and resulting log files under ~hpc/examples.
The examples are organized as follows:
misc/
miscellaneous serial/
simple (hello world) serial job mpi/
using MPI
openmp/
using OpenMP
idl/
running IDL
gdl/
running GDL
java/
running JAVA
lapack/
linking with LAPACK
and Intel'sMKL
memtest/
large memory use and reservation c++11/
using the C++11
extensionYou can use the command
find
to get a list of all the subdirectories under~hpc/examples
,
i.e.:% find ~hpc/examples -type d -print
9. Help me Choose a Queue and Write a Job Script
QSubGen: a Job Script Generator
There is a web page with an app to help you choose a queue and write a job script, mostly how to write the embedded directives and load modules.
- QSubGen: the job script generator.
What Queue Should I Use?
To choose a queue, you need to know
- whether is it a serial (single CPU) or parallel (multiple CPUs) job,
- if it is a parallel job, what kind,
- how much memory this job will need,
- how much CPU time it will require.
Indeed:
If your computation will use | your job script needs to | qsub option needed/recommended |
---|---|---|
more than one CPU (parallel jobs need) | request a PE and N slots | -pe <pe-name> N or -pe <pe-name> N-M |
more than 2GB/CPU of memory | reserve the required memory | -l mres=X,h_data=X,h_vmem=X |
more than 6GB/CPU of memory | use a high-memory queue, and reserve the required memory | -l mres=X,h_data=X,h_vmem=X,himem |
up to T hours of CPU (per CPU) | specify the required amount | -l s_cpu=T:0:0 |
or specify the queue | -q mThC.q | |
no idea how much CPU | use an unlimited, low priority queue | -q uThC.q -l lopri |
X
can be something like 2GBT
can be something like 240 (for 240 hours or 10 days)
- You may need to combine PE, memory and CPU resource requests.
- Remember, that the more resources your job requests, the fewer concurrent similar jobs can run at any time.
- Similar jobs will need similar resources, so when in doubt and before queuing a slew of similar jobs:
- run one job and monitor its resource usage, then
- queue the other jobs after trimming the requested resources (CPU and memory).
The local toolcheck-jobs.pl
allows you to check the resources consumed by jobs that have completed.
Why Can't I Queue that Job?
There can be different reasons why a job is rejected:
- inconsistency in your resources request, like asking for more CPU or memory that the limit of a given queue;
- unavailable resources, like asking for more CPUs or more memory on a single node than exists on any compute nodes;
- exceeding resource limits, like asking for more CPUs than are allowed per user in a given queue.
Use the -w v
or the -verify
flag to qsub
, see queue selection validation and/or verification, to check a job script file.
Why Is my Job Queued but not Running?
There can be different reasons why a job remains in the queue:
- the requested resources are not available, like there is no compute node with the requested number of CPUs or amount of memory currently available;
- the user resource quota has been reached, like the allowed total amount of CPUs or memory used by a single user was reached.
Use the command qconf -srqs
or qquota
, see how to check under resource limits.
The local tool check-qwait
allows you to visualize the queue quota resources and which jobs are waiting.
Last Updated SGK