Page tree
Skip to end of metadata
Go to start of metadata
  1. Cluster Status
  2. Compute Nodes Status
  3. Query the Cluster Configuration
  4. Ganglia Web Page
  5. Cluster Status Web Page

1 Cluster Status

The command

   % qstat -g c

Returns the cluster status, in a tabular form, i.e.:

CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.23      0      0     72   3084      0   3012 
lThC.q                            0.24    241      0   2567   2884     12     64 
lThM.q                            0.25      7      0    745    816      0     64 
mThC.q                            0.24    259      0   2561   2884     12     64 
mThM.q                            0.25    102      0    650    816      0     64 
sThC.q                            0.24      0      0   2808   2884     12     64 
sThM.q                            0.25      0      0    752    816      0     64 
uThC.q                            0.24    101      0   2707   2884     12     64 
uThM.q                            0.25      1      0    751    816      0     64 
uTspM.rq                          -NA-      0      0      0      0      0      0 
uTxlM.rq                          0.01      8      0     72     80      0      0 

You can also use

   % q+ -gc

(no space in -gc) to get:

   ---- queue ----  ----- #nodes ---- - ---------- #slots ----------- - ------
   name       load  total avail  down - total used  resvd  down avail - %full
   sThC.q    740.1     78    78     0 -  2884     0     0     0  2884 -  25.7
   mThC.q    740.1     78    78     0 -  2884   259     0     0  2625 -  25.7
   lThC.q    740.1     78    78     0 -  2884   241     0     0  2643 -  25.7
   uThC.q    740.1     78    78     0 -  2884   101     0     0  2783 -  25.7
   sThM.q    198.7     14    14     0 -   816     0     0     0   816 -  24.4
   mThM.q    198.7     14    14     0 -   816   102     0     0   714 -  24.4
   lThM.q    198.7     14    14     0 -   816     7     0     0   809 -  24.4
   uThM.q    198.7     14    14     0 -   816     1     0     0   815 -  24.4
   uTxlM.rq    1.1      2     2     0 -    80     8     0     0    72 -   1.4
   uTspM.rq   -NA-

 

2 Compute Nodes Status

The command

   % qhost

returns the list of hosts (compute nodes) and their respective properties.

You can restrict the list by specifying the hosts, like

   % qhost -h compute-2-2 compute-2-3

but you can't use REs. So you use a filter, like egrep, to parse its output:

   % qhost | egrep 'LOAD|e-[0123]'

This will print any line with either the string 'LOAD' or a line that matches the RE "e-[0123]", and will thus match compute-0, compute-1, etc... and hence print hosts in rack 0, 1, 2 and 3.

The utility egrep combined with REs (regular expressions) can be a very powerful filter.

The command qhost takes the "-q" or the "-j" option to show the queues or the jobs associated with each host(s):

qhost -q -h compute-2-2show which queues include the compute node 2-2
qhost -j -h compute-2-2show which jobs are running on the compute node 2-2

3 Query the Cluster Configuration

The command qconf is used to both set and query the queue configuration.

All the options of qconf that start with -s correspond to a query: i.e., show something. The following options may be useful

-sc

show complex attributes

-sconfl

show a list of all local configurations

-sconf [host_list]

show configurations

-shgrpl

show host group list

-shgrp group

show host group

-srqsl

show resource quota set list

-srqs [rqs_list]

show resource quota set(s)

-spl

show all parallel environments

-sp pe-name

show a parallel environment

-sql

show a list of all queues

-sq destin_id_list]

show the given queue

-ssconf

show scheduler configuration

-sul

show a list of all userset lists

-su listname_list

show the given userset list

(warning) Use the command

   % qconf -srqs

to query the resource quota set, i.e. the limits on queues, or

   % qconf -srqs u_slots

to query a specific quota.

(lightbulb) Use the command

   % qconf -sq sThM.q

to show the configuration of the sThM.q queue. Some of the options to the command qconf take REs, so for example the command:

   % qconf -sq '?ThM.q' | egrep 'qname|s_cpu|s_rt'

returns the soft CPU and R/T limits for all the hi-mem queues, using egrep to filter the output of qconf, namely:

qname                 lThM.q
s_rt                  1440:00:00
s_cpu                 720:00:00
qname                 mThM.q
s_rt                  144:00:00
s_cpu                 72:00:00
qname                 sThM.q
s_rt                  14:00:00
s_cpu                 7:00:00
qname                 uThM.q
s_rt                  INFINITY
s_cpu                 INFINITY

 

4 Ganglia Web Page

The Rocks distribution comes with a graphical interface to view the cluster's status.

From a trusted machine, you can view it here. It can be tricky to use, but it gives you access to a lot of information.

5 Cluster Status Web Page

We have also a home-grown status web page that can be accessed at two locations:

  • from a trusted machine, here (at .si.edu), or
  • from anywhere, here (at .cfa.harvard.edu).

These pages give you a good overview of the cluster current and past usage, and include a disk space usage information.


Last Updated  SGK/PBF.

  • No labels