1 Cluster Status
The command
% qstat -g c
Returns the cluster status, in a tabular form, i.e.:
CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE -------------------------------------------------------------------------------- all.q 0.28 0 0 0 4136 0 4136 lTIO.sq 0.35 0 0 8 8 0 0 lTb2g.q 0.23 0 0 8 8 0 0 lThC.q 0.29 347 0 3029 3376 0 0 lThM.q 0.31 100 0 1044 1144 0 0 mThC.q 0.29 402 0 2974 3376 0 0 mThM.q 0.31 330 0 814 1144 0 0 qGPU.iq 0.00 0 0 40 40 0 0 qrsh.iq 0.35 12 0 28 40 0 0 sThC.q 0.29 456 0 2920 3376 0 0 sThM.q 0.31 0 0 1144 1144 0 0 uTGPU.tq 0.00 0 0 104 104 0 0 uTSSD.tq 0.24 0 0 312 312 0 0 uThC.q 0.29 31 0 3345 3376 0 0 uThM.q 0.31 0 0 1144 1144 0 0 uTxlM.rq 0.19 24 0 152 176 0 0
You can also use
% qstat+ -gc
(no space in -gc
) to get:
---- queue ---- ----- #nodes ---- - ---------- #slots ----------- - ------------ name load total avail down - total used resvd down avail - %full %eff sThC.q 979.9 70 70 0 - 3376 474 0 0 2902 - 14.0 mThC.q 979.9 70 70 0 - 3376 402 0 0 2974 - 11.9 lThC.q 979.9 70 70 0 - 3376 347 0 0 3029 - 10.3 uThC.q 979.9 70 70 0 - 3376 31 0 0 3345 - 0.9 78.1 sThM.q 358.1 19 19 0 - 1144 0 0 0 1144 - 0.0 mThM.q 358.1 19 19 0 - 1144 330 0 0 814 - 28.8 lThM.q 358.1 19 19 0 - 1144 100 0 0 1044 - 8.7 uThM.q 358.1 19 19 0 - 1144 0 0 0 1144 - 0.0 83.3 uTxlM.rq 33.0 3 3 0 - 176 24 0 0 152 - 13.6 137.5 qrsh.iq 12.1 2 2 0 - 40 12 0 0 28 - 30.0 qGPU.iq 0.1 3 3 0 - 40 0 0 0 40 - 0.0 0.5 lTIO.sq 2.4 2 2 0 - 8 0 0 0 8 - 0.0 uTSSD.tq 74.6 5 5 0 - 312 0 0 0 312 - 0.0 uTGPU.tq 0.1 3 3 0 - 104 0 0 0 104 - 0.0
2 Compute Nodes Status
The command
% qhost
returns the list of hosts (compute nodes) and their respective properties.
Under UGE, qhost
alone returns more columns (equiv to qhost -cb
under SGE). The option -ncb
returns the same columns as in SGE.
You can restrict the list by specifying the hosts, like
% qhost -h compute-64-02 compute-64-03
but you can't use RE
s. So you use a filter, like egrep
, to parse its output:
% qhost | egrep 'LOAD|e-[46]'
This will print any line with either the string 'LOAD
' or a line that matches the RE
"e-[46]
", and will thus match compute-4
, compute-6
, etc....
The utility egrep
combined with RE
s (regular expressions) can be a very powerful filter.
The command qhost
takes the "-q
" or the "-j
" option to show the queues or the jobs associated with each host(s):
qhost -q -h compute-64-02 | show which queues include the compute node 64-02 |
qhost -j -h compute-64-02 | show which jobs are running on the compute node 64-02 |
3 Query the Cluster Configuration
The command qconf
is used to both set and query the queue configuration.
All the options of qconf
that start with -s
correspond to a query: i.e., show something. The following options may be useful
| show complex attributes |
| show a list of all local configurations |
| show configurations |
| show host group list |
| show host group |
| show resource quota set list |
| show resource quota set(s) |
| show all parallel environments |
| show a parallel environment |
| show a list of all queues |
| show the given queue |
| show scheduler configuration |
| show a list of all userset lists |
| show the given userset list |
Use the command
% qconf -srqs
to query the resource quota set, i.e. the limits on queues, or
% qconf -srqs u_slots
to query a specific quota.
Use the command
% qconf -sq sThM.q
to show the configuration of the sThM.q
queue. Some of the options to the command qconf
take RE
s, so for example the command:
% qconf -sq '?ThM.q' | egrep 'qname|s_cpu|s_rt'
returns the soft CPU and R/T limits for all the hi-mem queues, using egrep to filter the output of qconf
, namely:
qname lThM.q s_rt 1440:00:00 s_cpu 720:00:00 qname mThM.q s_rt 144:00:00 s_cpu 72:00:00 qname sThM.q s_rt 14:00:00 s_cpu 7:00:00 qname uThM.q s_rt INFINITY s_cpu INFINITY
4 Cluster Status Web Page
We have also a home-grown status web page that can be accessed at two locations:
These pages give you a good overview of the cluster current and past usage, and include a disk space usage information.
Last Updated SGK/PBF.