Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  1. Cluster Status
  2. Compute Nodes Status
  3. Query the Cluster Configuration
  4. Ganglia Web Page
  5. Cluster Status Web Page

Anchor
ClusterStatus
ClusterStatus
1 Cluster Status

...

No Format
nopaneltrue
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.2328      0      0     72 0  3084 4136      0   30124136 
lThClTIO.qsq                            0.2435    241  0      0   2567   28848     12 8    64  lThM0      0 
lTb2g.q                            0.2523      70      0    745  8  816    8      0     64 0 mThC
lThC.q                            0.2429    259347      0   25613029   28843376     12 0    64  mThM0 
lThM.q                            0.2531    102100      0   1044 650  1144  816    0      0 
mThC.q   64  sThC.                       0.29    402      0   2974   3376      0      0 
mThM.q                            0.2431    330      0    814   1144      0   2808   28840 
qGPU.iq   12     64  sThM.q                 0.00      0      0     40     40      0.25      0 
qrsh.iq                           0.35     12      0    752 28   816  40      0     64 0 uThC
sThC.q                            0.2429    101456      0   27072920   28843376     12 0    64  uThM0 
sThM.q                            0.2531      10      0   1144 751  1144  816    0      0 
uTGPU.tq   64  uTspM.rq                     0.00      0      0    104    104      -NA-0      0 
uTSSD.tq                          0.24      0      0    312    312      0      0 
uTxlMuThC.rqq                            0.0129     31 8     0   3345   3376      0     72 0 
uThM.q  80                          0.31      0      0   1144   1144      0      0 
uTxlM.rq                          0.19     24      0    152    176      0      0 

You can also use

   % qqstat+ -gc

(no space in -gc) to get:

No Format
nopaneltrue
   ---- queue ----  ----- #nodes ---- - ---------- #slots ----------- - ------------
   name       load  total avail  down - total used  resvd  down avail - %full  %eff

   sThC.q    740979.19     7870    7870     0 -  28843376   474  0     0     0  28842902 -  2514.70
   mThC.q    740979.19     7870    7870     0 -  28843376   259402     0     0  26252974 -  2511.79
   lThC.q    740979.19     7870    7870     0 -  28843376   241347     0     0  26433029 -  2510.73
   uThC.q    740979.19     7870    7870     0 -  28843376   101 31     0     0  27833345 -  25 0.79  78.1

   sThM.q    198358.71     1419    1419     0 -  1144 816     0     0     0  1144 816 -  24 0.40
   mThM.q    198358.71     1419    1419     0 -  1144 816  330 102     0     0   714814 -  2428.48
   lThM.q    198358.71     1419    1419     0 -  1144 816  100     0     0  1044 -   8.7
   uThM.q    358.1     19    19     0 -  1144     0   809 -  24.4
   uThM.q    198.7  0     0  1144 -   0.0  83.3

   uTxlM.rq   33.0      3     3     0 -   176    24     0     0   152 -  13.6 137.5

   qrsh.iq    12.1     14 2   14  2     0 -   816 40    112     0     0   815 28 -  2430.40
   uTxlMqGPU.rqiq    1 0.1      3     3     0 -    40     0     0     0    40 -   0.0   0.5

   lTIO.sq     2.4      2     2     0 -    80 8     0     0     0     8 -   0.0
   uTSSD.tq   74.6      5     5     0 -   312     0    72  0     0   312 -   10.40
   uTspMuTGPU.rqtq   -NA-

 

 0.1      3     3     0 -   104     0     0     0   104 -   0.0


Anchor
NodesStatus
NodesStatus
2 Compute Nodes Status

...

returns the list of hosts (compute nodes) and their respective properties.

(warning) Under UGE, qhost alone returns more columns (equiv to qhost -cb under SGE). The option -ncb returns the same columns as in SGE.

You can restrict the list by specifying the hosts, like

   % qhost -h compute-264-2 02 compute-264-303

but you can't use REs. So you use a filter, like egrep, to parse its output:

   % qhost | egrep 'LOAD|e-[012346]'

This will print any line with either the string 'LOAD' or a line that matches the RE "e-[012346]", and will thus match compute-04, compute-16, etc... and hence print hosts in rack 0, 1, 2 and 3.

The utility egrep combined with REs (regular expressions) can be a very powerful filter.

The command qhost takes the "-q" or the "-j" option to show the queues or the jobs associated with each host(s):

qhost -q -h compute-
2
64-
2
02show which queues include the compute node
2
64-
2
02
qhost -j -h compute-
2
64-
2
02show which jobs are running on the compute node
2
64-
2
02

Anchor
QueryConfig
QueryConfig
3 Query the Cluster Configuration

...

All the options of qconf that start with -s correspond to a query: i.e., show something. The following options may be useful

-sc

show complex attributes

-sconfl

show a list of all local configurations

-sconf [host_list]

show configurations

-shgrpl

show host group list

-shgrp group

show host group

-srqsl

show resource quota set list

-srqs [rqs_list]

show resource quota set(s)

-spl

show all parallel environments

-sp pe-name

show a parallel environment

-sql

show a list of all queues

-sq destin_id_list]

show the given queue

-ssconf

show scheduler configuration

-sul

show a list of all userset lists

-su listname_list

show the given userset list

(warning) Use the command

   % qconf -srqs

...

No Format
nopaneltrue
qname                 lThM.q
s_rt                  1440:00:00
s_cpu                 720:00:00
qname                 mThM.q
s_rt                  144:00:00
s_cpu                 72:00:00
qname                 sThM.q
s_rt                  14:00:00
s_cpu                 7:00:00
qname                 uThM.q
s_rt                  INFINITY
s_cpu                 INFINITY

 


The Rocks distribution comes with a graphical interface to view the cluster's status.

From a trusted machine, you can view it here. It can be tricky to use, but it gives you access to a lot of information.

...

Anchor

...

StatusWebPage
StatusWebPage

...

4 Cluster Status Web Page

We have also a home-grown status web page that can be accessed at two locations:

  • from a trusted machine, here (at .si.edu), or
  • from anywhere, here (at .cfa.harvard.edu).

These pages give you a good overview of the cluster current and past usage, and include a disk space usage information.

...

Last Updated  SGK   SGK/PBF.