Page tree
Skip to end of metadata
Go to start of metadata
  1. Cluster Status
  2. Compute Nodes Status
  3. Query the Cluster Configuration
  4. Cluster Status Web Page

1 Cluster Status

The command

   % qstat -g c

Returns the cluster status, in a tabular form, i.e.:

CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.28      0      0      0   4136      0   4136 
lTIO.sq                           0.35      0      0      8      8      0      0 
lTb2g.q                           0.23      0      0      8      8      0      0 
lThC.q                            0.29    347      0   3029   3376      0      0 
lThM.q                            0.31    100      0   1044   1144      0      0 
mThC.q                            0.29    402      0   2974   3376      0      0 
mThM.q                            0.31    330      0    814   1144      0      0 
qGPU.iq                           0.00      0      0     40     40      0      0 
qrsh.iq                           0.35     12      0     28     40      0      0 
sThC.q                            0.29    456      0   2920   3376      0      0 
sThM.q                            0.31      0      0   1144   1144      0      0 
uTGPU.tq                          0.00      0      0    104    104      0      0 
uTSSD.tq                          0.24      0      0    312    312      0      0 
uThC.q                            0.29     31      0   3345   3376      0      0 
uThM.q                            0.31      0      0   1144   1144      0      0 
uTxlM.rq                          0.19     24      0    152    176      0      0 

You can also use

   % qstat+ -gc

(no space in -gc) to get:

   ---- queue ----  ----- #nodes ---- - ---------- #slots ----------- - ------------
   name       load  total avail  down - total used  resvd  down avail - %full  %eff

   sThC.q    979.9     70    70     0 -  3376   474     0     0  2902 -  14.0
   mThC.q    979.9     70    70     0 -  3376   402     0     0  2974 -  11.9
   lThC.q    979.9     70    70     0 -  3376   347     0     0  3029 -  10.3
   uThC.q    979.9     70    70     0 -  3376    31     0     0  3345 -   0.9  78.1

   sThM.q    358.1     19    19     0 -  1144     0     0     0  1144 -   0.0
   mThM.q    358.1     19    19     0 -  1144   330     0     0   814 -  28.8
   lThM.q    358.1     19    19     0 -  1144   100     0     0  1044 -   8.7
   uThM.q    358.1     19    19     0 -  1144     0     0     0  1144 -   0.0  83.3

   uTxlM.rq   33.0      3     3     0 -   176    24     0     0   152 -  13.6 137.5

   qrsh.iq    12.1      2     2     0 -    40    12     0     0    28 -  30.0
   qGPU.iq     0.1      3     3     0 -    40     0     0     0    40 -   0.0   0.5

   lTIO.sq     2.4      2     2     0 -     8     0     0     0     8 -   0.0
   uTSSD.tq   74.6      5     5     0 -   312     0     0     0   312 -   0.0
   uTGPU.tq    0.1      3     3     0 -   104     0     0     0   104 -   0.0


2 Compute Nodes Status

The command

   % qhost

returns the list of hosts (compute nodes) and their respective properties.

(warning) Under UGE, qhost alone returns more columns (equiv to qhost -cb under SGE). The option -ncb returns the same columns as in SGE.

You can restrict the list by specifying the hosts, like

   % qhost -h compute-64-02 compute-64-03

but you can't use REs. So you use a filter, like egrep, to parse its output:

   % qhost | egrep 'LOAD|e-[46]'

This will print any line with either the string 'LOAD' or a line that matches the RE "e-[46]", and will thus match compute-4, compute-6, etc....

The utility egrep combined with REs (regular expressions) can be a very powerful filter.

The command qhost takes the "-q" or the "-j" option to show the queues or the jobs associated with each host(s):

qhost -q -h compute-64-02show which queues include the compute node 64-02
qhost -j -h compute-64-02show which jobs are running on the compute node 64-02

3 Query the Cluster Configuration

The command qconf is used to both set and query the queue configuration.

All the options of qconf that start with -s correspond to a query: i.e., show something. The following options may be useful

-sc

show complex attributes

-sconfl

show a list of all local configurations

-sconf [host_list]

show configurations

-shgrpl

show host group list

-shgrp group

show host group

-srqsl

show resource quota set list

-srqs [rqs_list]

show resource quota set(s)

-spl

show all parallel environments

-sp pe-name

show a parallel environment

-sql

show a list of all queues

-sq destin_id_list]

show the given queue

-ssconf

show scheduler configuration

-sul

show a list of all userset lists

-su listname_list

show the given userset list

(warning) Use the command

   % qconf -srqs

to query the resource quota set, i.e. the limits on queues, or

   % qconf -srqs u_slots

to query a specific quota.

(lightbulb) Use the command

   % qconf -sq sThM.q

to show the configuration of the sThM.q queue. Some of the options to the command qconf take REs, so for example the command:

   % qconf -sq '?ThM.q' | egrep 'qname|s_cpu|s_rt'

returns the soft CPU and R/T limits for all the hi-mem queues, using egrep to filter the output of qconf, namely:

qname                 lThM.q
s_rt                  1440:00:00
s_cpu                 720:00:00
qname                 mThM.q
s_rt                  144:00:00
s_cpu                 72:00:00
qname                 sThM.q
s_rt                  14:00:00
s_cpu                 7:00:00
qname                 uThM.q
s_rt                  INFINITY
s_cpu                 INFINITY


4 Cluster Status Web Page

We have also a home-grown status web page that can be accessed at two locations:

  • from a trusted machine, here (at .si.edu), or
  • from anywhere, here (at .cfa.harvard.edu).

These pages give you a good overview of the cluster current and past usage, and include a disk space usage information.


Last Updated   SGK/PBF.

  • No labels