Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  1. What Disks to Use
  2. How to Copy Files to/from Hydra
    1. To/From another Linux machine
    2. From a computer running MacOS
    3. From a computer running Windows
    4. Using Globus
    5. Using Dropbox
    6. Using Firefox Send
  3. Disk Quotas
  4. Disk Configuration
  5. Disk Usage Monitoring
  6. NetApp Snapshots: How to Recover Old or Deleted Files
  7. Public Disks Scrubber
  8. SSD and Local Disk Space

Anchor
WhatDisks
WhatDisks
1. What Disks to Use

...

The available disk space is divided in several area (aka volumes, filesets or partitions):

  • a small partition for basic configuration files and small storage, the /home partition,
  • a set of medium size partitions, one for SAO users, one for non-SAO users, the /data partitions,
  • a set of large partitions, one for SAO users, one for non-SAO users, the /pool partitions,
  • a second set of large partitions for temporary storage, the /scratch partitions.

Note

...

  • we impose quotas: limits on how much can be stored on each disk (partition/volume/fileset) by each user, and
  • we monitor disk usage;
  • /home should not be used to keep large files, use /pool instead;
  • /pool is for active temporary storage (i.e., while a job is running).
  • If you need even more disk space , ask to be allowed to or your job(s) use(s) a lot of I/Os,  use /scratch.
    • Both partitions (/pool and /scratch) are scrubbed (see below): old stuff is deleted to make sure there is space for active users.
  • None of the disks on the cluster are for long term storage, :
    • please copy your results back to your "home" computer and
    • delete what you don't need any longer.
  • While the disk system on Hydra is highly reliable, none of the disks on the cluster are backed up.
  • Once you reach your quota you won't be able to write anything on that partition until you delete stuff.
  • A few nodes have local SSDs (solid state disks), and
    for special cases it may be OK to use disk space local to the compute node.
    Contact  contact us if your jobs can benefit from more disk space, SSDs or local disk space.

...

(warning) When copying to Hydra, especially large files, be sure to do it to the appropriate disk (and not /home or /tmp).

Anchor
CopyLinux
CopyLinux
2a. To/From Another Linux Machine

  • You can copy files to/from hydra using scp, sftp or rsync:
    • to Hydra you can only copy from trusted hosts (computers on SI or SAO/CfA trusted network, or VPN'ed),
    • from Hydra to any host that allows external ssh connections (if you can ssh from Hydra to it, you can scp, sftp and rsync to it).

  • For large transfers (over 70GB, sustained), we ask users to use rsync, and limit the bandwidth to 20 MB/s (70 GB/h),  with the "--bwlimit=" option:
    • rsync --bwlimit=20000 ...
      If this pose a problem, contact us (Sylvain or Paul).

    • Baseline transfer rate from SAO to HDC (Herndon data center) is around 300 Mbps, single thread, or ~36 MB/s or ~126 GB/h (as of Aug. 2016)
      The link saturates near 500 Mbps (50% of Gbps) or 62 MB/s or 220 GB/h 

  • Remember that rm, mv and cp can also create high I/O load, so consider to
    • limit your concurrent I/Os: do not start a slew of I/Os at the same time, and
    • serialize your I/Os as much as possible:  run one after the other.

...

Anchor
CopyMacOS
CopyMacOS
2b.From a Computer Running MacOS

A trusted or VPN'd computer running MacOS can use scp, sftp or rsync:

...

Alternatively you can use a GUI based ssh/scp compatible tool like FileZilla. Note, Cyberduck is not recommended because it uses a lot of CPU cycles on Hydra.

You will still most likely need to run VPN.

Anchor
CopyWindows
CopyWindows
2c. From a Computer Running Windows

(grey lightbulb) You can use scp, sftp or rsync if you install  Cygwin - Note that Cygwin includes a X11 server.

Alternatively you can use a GUI based ssh/scp compatible tool like FileZilla,  or WinSCP or Cyberduck.. Note, Cyberduck is not recommended because it uses a lot of CPU cycles on Hydra.

You will still most likely need to run VPN.

2d. Using Globus

(instructions missing)

Anchor
UseDropbox
UseDropbox
2e. Using Dropbox

Files can be exchanged with Dropbox using the script Dropbox-Uploader, which can be loaded using the tools/dropbox_uploader module and running the dropbox or dropbox_uploader.sh script. Running this for script for the first time will give instructions on how to configure your Dropbox account and create a ~/.dropbox_uploader config file with authentication information.

Using this method will not sync your Dropbox, but will allow you to upload/download specific files.

Anchor
UseFFSend
UseFFSend
2f. Using Firefox Send

  • Firefox Send is a free online file-sending service (or a file exchange mechanism).
  • Using this system along with the command ffsend available on Hydra (module load bioinformatics/ffsend), you can transfer files to/from Hydra without needing VPN.
  • Firefox Send is a two-step process,
    1. you first upload a file (or a set of files packed in an archive) to the Firefox Send server which will generate a unique URL for the upload, and
    2. you download the file using that URL and recover the file's original name. 

(lightbulb)You can upload up to 1GB at a time, and if you sign up for a Firefox account, that limit increases to 2.5GB.

Example 1: Sending to Hydra

a. Uploading files from your local machine (workstation/laptop) using the Firefox Send website:

  1. Open the Firefox Send website (send.firefox.com) from any browser.
  2. Choose a file to upload, and optionally:
    1. use tarzip etc. to upload an archive of several files.  
    2. modify the expiration of the file (number of downloads or number of days), the default is to allow only one download and it expires within one day.
    3. add a password that's needed to download the file.
  3. Copy the URL generated for your upload.

(warning) You need to save that unique URL to get that file later. Unlike Dropbox or Google Drive, Firefox Send will not show you what you uploaded. 

b. Downloading on Hydra from Firefox Send using ffsend 

Code Block
languagetext
titleon Hydra
$ module load bioinformatics/ffsend
$ ffsend download https://send.firefox.com/download/7800f8272ba5ef7b/#cNSwgMaNqmdsdwG6RxM71A
Download complete

Example 2: Sending from Hydra

a. Uploading from Hydra using ffsend 


Code Block
languagetext
titleon Hydra
$ module load bioinformatics/ffsend
$ ffsend upload test.tar.gz                                                                         
Upload complete                                                                                                               
https://send.firefox.com/download/0324d02485dc9a02/#cxER28yNyf2dcwzwfIla6g

b. Downloading to your local machine (workstation/laptop)Optional: ffsend has options for setting a password and expiration. See ffsend help for more information.

Open the URL created on Hydra in a web browser to download the file to your local machine.

Anchor
DiskQuotas
DiskQuotas
3. Disk Quotas

...

Anchor
DiskConfig
DiskConfig
4. Disk Configuration

[ Updated   ]

 

MaximumQuotas per user
NetApp 



diskdisk space

no. of files

snapshots
Snapshots
 

Disk name

capacity

soft/hard

soft/hard

enabled?

Purpose

/home

10TB

20T*

50

100/

100GB

200G

1
3.
8
6/
2M
4M

yes: 4 weeks

For your basic configuration files, scripts and job files

- your limit is low but you can recover old stuff up to 4 weeks.

/data/sao

or

/data/nasm

40TB*

45T

1.9/2.

0TB

0T

4.
75
8/5M

yes: 2 weeks

 For important but relatively small files like final results, etc.

- your limit is medium, you can recover old stuff, but disk space is not released right away.

For SAO or NASM users.

/data/genomics

30TB*

45T

0.
45
8/
0
1.
5TB
0T
1
2.
19
4/
1
2.
25M
5M

yes: 2 weeks

For important but relatively small files like final results, etc.

- your limit is medium, you can recover old stuff, but disk space is not released right away.

For non-SAO/NASM users.

/pool/sao

or

/pool/nasm

45TB

80T

1.9/2.

0TB

0T

4/5M

no

For the bulk of your storage

- your limit is high, and disk space is released right away, for SAO or NASM users.

/pool/genomics

50TB

80T

1.9/2.
0TB
0T4.
75
8/5M

no

For the bulk of your storage

- your limit is high, and disk space is released right away, for non-SAO users.

/pool/biology

7TB4.75/

200G

1.9/2.0TB
100/200G0.45/0.5M

no

For the bulk of your storage

- your limit is high, and disk space is released right away, for non-SAO/NASM users.

/scratch/genomics

100TB

350T

9

.5

/

10.0TB23.75/25M

10T

25/26M

no

For temporary storage, if you need more than what you can keep in /pool

- SAO, NASM or

for non-SAO/NASM users

should use

 

/scratch/sao

,

/scratch/nasm

or /scratch/genomics, respectively/scratch/genomics01:sao0150TB14/15TB2/2M

350T

9/10T

25/26M

no

Additional

For temporary storage,

on old (slow) disks, that will eventually be retired

if you need more than what you can keep in /pool

for SAO, NASM users



  
  

 Project specific disks
(/pool
/nmnh_ggi21TB15.0/15.8T37.4/39.4MnoNMHN/GGI/
)
/pool/kistlerl
21TB
21T20.0/21.0T49.9/52.5MnoNMNH/Logan
K
Kistler
/pool/kozakk
11TB
11T10.5/11.0T26.1/27.5MnoSTRI/Krzysztof
K
Kozak
/pool/
sao_atmos36TB

 8.0/10TB

9/10M
nmnh_ggi21T15.0/15.8T37.4/39.4MnoNMHN/GGI
/pool/sao_access21T15.0/15.8T37.4/39.4MnoSAO/
ATMOS
ACCESS
/pool/sao_rtdc
10TB
10T*2.8/3.
0TB
0T2.5/3.0MnoSAO/RTDC
/pool/
sao_access21TB15.0/15.8TB37.4/39.4MnoSAO/ACCESS/pool/sylvain15TB14/15TB 63/65M
sylvain30T29/30T71/75Mno

SAO/Sylvain Korzennik






Project specific disks (/scratch)
/scratch/bradys

25T

--noNMNH/Seán Brady/BRADY_LAB
/scratch/usda_sel25T24/25T52M/62MnoNMNH/Christopher Owen/USDA_SEL
/scratch/nzp_ccg25T24/25T52M/62MnoNZP/Michael Campana/CCG
/scratch/kistlerl50T--noNMNH/Logan Kistler
/scratch/meyerc25T24/25T52M/62MnoNMNH/Christopher Meyer
/scratch/nmnh_ggi25T24/25T52M/62MnoNMNH/GGI
/scratch/nmnh_lab25T4/5T10M/12MnoNMNH/LAB
/scratch/stri_ap25T4/5T10M/12MnoSTRI/W. Owen McMillan/STRI_AP
/scratch/sao_atmos186T98/100T252M/261MnoSAO/ATMOS
/scratch/sao_cga25T7/8T18M/20MnoSAO/CGA
/scratch/sao_tess50T36/40T94M/210MnoSAO/TESS
/scratch/sylvain50T48/50T115M/128MnoSAO/Sylvain
K
Korzennik
/scratch/schultzt25T--noNMNH/Ted Schultz/SCHULTZ_LAB
/scratch/wrbu40T38/40T99M/100M noWRBU





Extra
/pool/admin
10TB
10T*5.7/6.
0TB
0T

14.3/15.0M

noSys Admin
/pool/galaxy
15TB
15T*10.7/11.
3TB
3T26.7/28.1MnoGalaxy





Near line (/store)
/store/public270T5/5Tn/ayes: 8 weeksPublic, available upon request
/store/admin

20T

-n/ayes: 8 weeksSys Admin
/store/bradys

40T

-n/ayes: 8 weeksNMNH/Seán Brady/BRADY_LAB
/store/nmnh_ggi40T-n/ayes: 8 weeksNMNH/GGI
/store/sao_atmos300TB-n/ayes: 8 weeksSAO/ATMOS
/store/sylvain100TB-n/ayes: 8 weeksSAO/Sylvain Korzennik
/store/schultzt40TB-n/ayes: 8 weeksNMNH/Ted Schultz/SCHULTZ_LAB
/store/wrbu40TB-n/ayes: 8 weeksWRBU

*: maximum size, disk size will increase up to that value if/when usage grows

(as of Nov 15, 20172019)

Notes

  • The notation
    • 1.8/2.0TB means that the soft limit is 1.8TB and the hard limit is 2.0TB of disk space, while
    • 4/5M means that the soft limit is 4 million inodes and the hard limit is 5 million.

  • It is inefficient to store a slew of small files and if you do you may reach your inodes quota before your space quota (too many small files).
    •  Some of the disk monitoring tools show the inode usage.
    • If your %(inode)>%(space) your disk usage is inefficient,
      consider archiving your files into zip or tar-compressed sets.

  • While some of the tool(s) you use may force you to be inefficient while jobs are running, you should remember to
    • remove useless files when jobs have completed,
    • compress files that can benefit from compression (with gzip, bzip2 or compress), and
    • archive a slew of files into a zip or a tar-compressed set, as follows:
         % zip archive.zip dir/
      or
         % tar -czf archive.tgz dir/
      both examples archive the content of the directory dir/ into a single zip or a tgz file. You can then delete the content of dir/ with
         % rm -rf dir/
  • You can unpack each type of archive with
       % unzip archive.zip
    or
       % tar xf archive.tgz

  • The sizes of  some of the partitions (aka the various disks) on the NetApp will "auto-grow" until they reach the  listed maximum capacity,
    so the size shown by the traditional Un*x command, like df does not necessarily reflect the maximum size.

    We have implement a FIFO (first in first out) model, where old files are deleted  to make space, aka scrubbed.
    • There is an age limit, meaning that only files older than 180 days (or 90 days) get deleted.
    • Older files get deleted before the newer ones (FIFO),
    • We run a scrubber an a regular interval.
  • In any case, we ask you to remove from /pool and /scratch files that you do not need for active jobs.

  • For projects that want dedicated disk space, such space can be secured with project's specific funds when we expand the disk farm (contact us).

...

  • You can use the following Un*x commands:

    dushow disk use
    dfshow disk free

    or

  • you can use Hydra-specific home-grown tools, (these require that you load the tooltools/local module or tools/local+ modules)

    dus-report
    .pl
    run du and parse its output in a more user friendly format
    disk-usage
    .pl
     run df and parse its output in a more user friendly format


  • You can also view the disk status at the cluster status web pages, either

    • here (at cfa.harvard.edu)
      or
    • here (at si.edu).

...

No Format
nopaneltrue
% du -sh dir/
136M    dir/

 


The output of df can be very long and confusing.

...

No Format
nopaneltrue
% df -h /pool/sao
Filesystem           Size  Used Avail Use% Mounted on
10.61.10.1:/vol_sao   20T   15T  5.1T  75% /pool/sao

 



or try

% df -h --output=source,fstype,size,used,avail,pcent,file /scratch/genomics
Filesystem     Type  Size  Used Avail Use% File
gpfs01         gpfs  400T   95T  306T  24% /scratch/genomics


You can compile the output of du into a more useful report with the dus-report tool. pl tool. This tool will run du for you (can take a while) and parse its output to produce a more concise/useful report.

...

No Format
nopaneltrue
% dus-report.pl /pool/sao/hpc
 612.372 GB            /pool/sao/hpc
                       capac.   20.000 TB (75% full), avail.    5.088 TB
 447.026 GB  73.00 %   /pool/sao/hpc/rtdc
 308.076 GB  50.31 %   /pool/sao/hpc/rtdc/v4.4.0
 138.950 GB  22.69 %   /pool/sao/hpc/rtdc/vX
 137.051 GB  22.38 %   /pool/sao/hpc/rtdc/vX/M100-test-oob-2
 120.198 GB  19.63 %   /pool/sao/hpc/rtdc/v4.4.0/test2
 120.198 GB  19.63 %   /pool/sao/hpc/rtdc/v4.4.0/test2-2-9
  83.229 GB  13.59 %   /pool/sao/hpc/c7
  83.229 GB  13.59 %   /pool/sao/hpc/c7/hpc
  65.280 GB  10.66 %   /pool/sao/hpc/sw
  64.235 GB  10.49 %   /pool/sao/hpc/rtdc/v4.4.0/test1
  49.594 GB   8.10 %   /pool/sao/hpc/sw/intel-cluster-studio
  46.851 GB   7.65 %   /pool/sao/hpc/rtdc/vX/M100-test-oob-2/X54.ms
  46.851 GB   7.65 %   /pool/sao/hpc/rtdc/vX/M100-test-oob-2/X54.ms/SUBMSS
  43.047 GB   7.03 %   /pool/sao/hpc/rtdc/vX/M100-test-oob-2/X220.ms
  43.047 GB   7.03 %   /pool/sao/hpc/rtdc/vX/M100-test-oob-2/X220.ms/SUBMSS
  42.261 GB   6.90 %   /pool/sao/hpc/c7/hpc/sw
  36.409 GB   5.95 %   /pool/sao/hpc/c7/hpc/tests
  30.965 GB   5.06 %   /pool/sao/hpc/c7/hpc/sw/intel-cluster-studio
  23.576 GB   3.85 %   /pool/sao/hpc/rtdc/v4.4.0/test2/X54.ms
  23.576 GB   3.85 %   /pool/sao/hpc/rtdc/v4.4.0/test2-2-9/X54.ms
  23.576 GB   3.85 %   /pool/sao/hpc/rtdc/v4.4.0/test2/X54.ms/SUBMSS
  23.576 GB   3.85 %   /pool/sao/hpc/rtdc/v4.4.0/test2-2-9/X54.ms/SUBMSS
  22.931 GB   3.74 %   /pool/sao/hpc/rtdc/v4.4.0/test2/X220.ms
  22.931 GB   3.74 %   /pool/sao/hpc/rtdc/v4.4.0/test2-2-9/X220.ms
report in /tmp/dus.pool.sao.hpc.hpc

You can rerun dus-report.pl with  with different options on the same intermediate file, like

   % dus-report .pl -n 999 -pc 1 /tmp/dus.pool.sao.hpc.hpc

to get a different report, to see the list down to 1%. Use

   % dus-report .pl -help 

to see how else you can use it.

 


The tool disk-usage.pl usage runs df and presents its output in a more friendly format:

No Format
nopaneltrue
% disk-usage.pl -d all+
Filesystem                              Size     Used    Avail Capacity  Mounted on
NetApp.2netapp-n1:/vol_home                     46.00T40T    13.72T05T    23.28T35T  43%48%/14%38%  /home
NetApp.2netapp-n2:/vol_data_genomics           1836.00T  673  4.63G83T   1731.34T17T   4%14%/1%2%   /data/genomics
NetApp.2netapp-n2:/vol_data/sao                27.00T    58.25T65T   2118.75T35T  20%33%/14%19%  /data/sao
NetApp.2netapp-n2:/vol_data/nasm               27.00T    58.25T65T   2118.75T35T  20%33%/14%19%  /data/nasm
NetApp.2netapp-n2:/vol_data/admin              27.00T    58.25T65T   2118.75T35T  20%33%/14%19%  /data/admin
NetApp.2netapp-n1:/vol_biologypool_bio              200.00G   30.25G 7. 169.75G  16%/1%   /pool/biology
netapp-n2:/vol_pool_genomics          55.00T    8.64G    6.99T   1%37.98T   17.02T  70%/15%  /pool/genomics
netapp-n1:/vol_pool_sao               37.00T    7.68T   29.32T  21%/1%   /pool/biology
NetApp.2sao
netapp-n1:/vol_genomicspool_sao/nasm          37.00T    7.68T  50. 29.32T  21%/1%   /pool/nasm
emc-isilon:/ifs/nfs/hydra             60.00T   3339.82T   1620.18T  68%67%/11%1%   /pool/genomics
NetApp.2:/vol_isilon
gpfs01:genomics                      400.00T   94.60T  305.40T  24%/9%   /scratch/genomics
gpfs01:sao                     45      400.00T   14 5.15T04T   30394.85T96T  32% 2%/5%1%   /poolscratch/sao
NetApp.2netapp-n1:/vol_sao/nasmpool_kistlerl          21.00T   18.50T    2.50T  89%/1%   /pool/kistlerl
netapp-n2:/vol_pool_kozakk            4511.00T   14 7.15T82T   30 3.85T18T  32%72%/5%1%   /pool/nasm
Isilon.10:/ifs/nfs/hydrakozakk
netapp-n1:/vol_pool_nmnh_ggi              6021.00T   3314.12T79T   26 6.88T21T  56%71%/89%8%   /pool/isilon
NetApp.2nmnh_ggi
netapp-n1:/vol_scratch/genomicspool_sao_access       100 21.00T   45 2.66T37T   5418.34T63T  46%12%/38%2%   /scratchpool/genomics
NetApp.2sao_access
netapp-n2:/vol_scratch/pool_sao_rtdc            1002.00T   4562.66T13G   54 1.34T94T  46% 4%/38%1%   /scratchpool/sao
NetApp.2_rtdc
netapp-n1:/vol_scratch/nasm   pool_sylvain           30.00T   24.83T    5.17T  83%/36%  /pool/sylvain
gpfs01:nmnh_bradys        100            25.00T   4558.66T71G   5424.34T94T  46% 1%/38%1%   /scratch/nasm
NetApp.5:/vol/a2v1/genomics01 bradys
gpfs01:usda_sel                       25.00T  651.81G   24.36T   3%/4%   /scratch/usda_sel
gpfs01:nzp_ccg        31.25T    4.62T   26.63T  15%/11%  /scratch/genomics01
NetApp.5:/vol/a2v1/sao01             25.00T  924.33G   24.10T   4%/1%   /scratch/nzp_ccg
gpfs01:nmnh_kistlerl                  50.00T   11.93T   38.07T  24%/1%   /scratch/kistlerl
gpfs01:nmnh_meyerc             31.25T       25.00T    40.62T00G   2625.63T00T  15% 0%/11%1%   /scratch/sao01meyerc
NetApp.2:/vol_pool_gpfs01:nmnh_ggi           21            25.00T    4.35T85T   1620.65T15T  21%20%/1%   /poolscratch/nmnh_ggi
NetApp.2:/vol_pool_kistlerlgpfs01:nmnh_lab           21            25.00T    10.98T00G   1925.02T00T  10% 0%/1%   /poolscratch/kistlerl
NetApp.2:/vol_pool_kozakknmnh_lab
gpfs01:stri_ap             11           25.00T    70.06T00G    325.94T00T  65% 0%/1%   /poolscratch/kozakk
NetApp.2:/vol_stri_ap
gpfs01:sao_atmos                     186.00T   51.15T  134.85T  28%/6%   /scratch/sao_atmos
gpfs01:sao_cga              36          25.00T   15 8.30T14T   2016.70T86T  43%33%/3%4%   /poolscratch/sao_atmos
NetApp.2:/vol_sao_rtdccga
gpfs01:sao_tess                       50.00T    3.29T   46.71T   7%/4%   /scratch/sao_tess
gpfs01:sao_sylvain                 2   50.00T  167  6.51G63T    1.84T   9%/43.37T  14%/2%   /scratch/sylvain
gpfs01:nmnh_schultzt                  25.00T  376.87G   24.63T   2%/3%   /scratch/schultzt
gpfs01:wrbu                           40.00T    3.00T   37.00T   8%/1%   /pool/sao_rtdc
NetApp.2scratch/wrbu
netapp-n1:/vol_pool_admin              3.92T    2.71T    1.21T  70%/5%   /pool/admin
netapp-n1:/vol_pool_sao_accessgalaxy           400.00G  194.15G  205.85G  49%/1%   /pool/galaxy
gpfs01:admin                          20.00T    1.96T   18.04T  10%/21%  /scratch/admin
gpfs01:bioinformatics_dbs         21    10.00T  654868.49G14G   20 9.36T15T   4%9%/1%   /poolscratch/sao_access
NetApp.2dbs
nas:/vol_sylvainmnt/pool_01/admin                 3020.00T   12 1.67T   1718.33T  43% 9%/23%1%   /store/admin
nas:/mnt/pool_02/sylvain
NetApp.2:/vol_pool_adminnmnh_bradys          40.00T  306.52G   39.70T   1%/1%   /store/bradys
nas:/mnt/pool_02/nmnh_ggi             40.00T  4. 22.09T   17.91T  56%/1%   /store/nmnh_ggi
nas:/mnt/pool_03/public              270.00T  912 22.88G55T    3.11T  247.45T   9%/1%   /store/public
nas:/mnt/pool_01/sao_atmos           299.97T   68.73T  231.24T  23%/1%   /poolstore/admin
NetApp.2sao_atmos
nas:/vol_mnt/pool_galaxy01/sao_sylvain         100.00T    10.8.39T   91.61T   9%/1%   /store/sylvain
nas:/mnt/pool_02/nmnh_schultzt        40.00T    02.00G49T   1037.00T51T   1%7%/1%   /store/schultzt
nas:/mnt/pool_02/galaxy  wrbu                 40.00T  618.24G   39.40T   2%/1%   /store/wrbu

Use

   % disk-usage .pl -help

to see how else to use it.

You can, for instance, get the disk quotas and the max size, for all the disks, including /store, with:

No Format
nopaneltrue
% disk-usage.pl -d all+ -quotas
                                                                 quotas:  disk space    #inodes     max
Filesystem                              Size     Used    Avail Capacity    soft/hard    soft/hard   size Mounted on
NetApp.2netapp-n1:/vol_home                     46.00T40T    13.72T05T    23.28T35T  43%48%/14%38%     50G/100G    1.80M8M/2.00M0M   10T /home
NetApp.2netapp-n2:/vol_data_genomics           1836.00T  673  4.63G83T   1731.34T17T   4%14%/1%2%     486G/512G    1.19M2M/1.25M3M   30T /data/genomics
NetApp.2netapp-n2:/vol_data/*                  27.00T    58.25T65T   2118.75T35T  20%33%/14%19%    1.9T/2.0T    4.75M8M/5.00M0M   40T /data/sao:nasm:admin
NetApp.2*
netapp-n1:/vol_biologypool_bio                  7.00T200.00G    830.64G25G    6169.99T75G   1%16%/1%     1.9T/2.0T    4.75M8M/5.00M0M  n/a   -  /pool/biology
NetApp.2netapp-n2:/vol_pool_genomics                5055.00T   3337.79T98T   1617.21T02T  68%70%/11%15%    1.9T/2.0T    4.75M8M/5.00M0M  n/a   -  /pool/genomics
NetApp.2netapp-n1:/vol_pool_sao                     4537.00T   14 7.33T68T   3029.67T32T  32%21%/5%1%     1.9T/2.0T    4.75M8M/5.00M0M  n/a   -  /pool/sao:nasm
Isilon.10*
emc-isilon:/ifs/nfs/hydra              60.00T   3339.12T82T   2620.88T18T  56%67%/89%1%     nyi/nyi    -  nyi/nyi    n/a       -        -  /pool/isilon
NetApp.2:/vol_scratch/*gpfs01:genomics              100        400.00T   4594.51T60T   54305.49T40T  46%24%/38%9%     9.5T0T/10.0T 23.75M/25.00M n/a    25M/26M     -  /scratch/genomics
gpfs01:sao:nasm NetApp.5:/vol/a2v1/*                  31.25T    4.62T    400.00T   26 5.63T04T  15%/11%394.96T   2%/1%   14  9.0T/1510.0T   2.0M/2.0M 25M/26M   n/a   -  /scratch/genomics01:sao01
NetApp.2sao
netapp-n1:/vol_pool_nmnh_ggikistlerl           21.00T    418.35T50T   16 2.65T50T  21%89%/1%    1520.0T/1521.8T 37.41M/39.38M n/a /pool/nmnh_ggi
NetApp.20T    50M/53M     -  /pool/kistlerl
netapp-n2:/vol_pool_kistlerlkozakk           21 11.00T    17.98T82T   19 3.02T18T  10%72%/1%    2010.0T5T/2111.0T 49.88M/52.50M n/a    26M/28M     -  /pool/kistlerl
NetApp.2kozakk
netapp-n1:/vol_pool_kozakknmnh_ggi             1121.00T    714.06T79T    36.94T21T  65%71%/1%8%    1015.5T/11.0T 26.13M/2715.50M n/a 8T    37M/39M     -  /pool/kozakk
NetApp.2nmnh_ggi
netapp-n1:/vol_pool_sao_atmosaccess        21.00T    2.37T   3618.00T63T  12%/2%    15.30T0T/15.8T   20.70T  43%37M/3%39M    25.7T/27.0T 64.13M/67.50M n/a -  /pool/sao_atmos
NetApp.2access
netapp-n2:/vol_pool_sao_rtdc                 2.00T  167 62.51G13G    1.84T94T   9%4%/1%     2.9T/3.0T    7.13M1M/7.50M5M   10T /pool/sao_rtdc
NetApp.2netapp-n1:/vol_pool_sao_accesssylvain         21  30.00T  654 24.49G83T   20 5.36T17T   4%83%/1%36%    1528.0T5T/1530.8T 37.41M/39.38M n/a /pool/sao_access
NetApp.2:/vol_sylvain0T    71M/75M     -  /pool/sylvain
gpfs01:nmnh_bradys                 30   25.00T   1258.67T71G   1724.33T94T  43% 1%/23%1%   28.5T/30.0T 71.25M/75.00M n/a /pool/sylvain
NetApp.2:/vol_pool_admin    -            -        -  /scratch/bradys
gpfs01:usda_sel               4        25.00T  912651.88G81G    324.11T36T  23% 3%/1%4%     524.7T0T/625.0T  14.25M/15.00M 10T /pool/admin
NetApp.2:/vol_pool_galaxy  52M/62M     -  /scratch/usda_sel
gpfs01:nzp_ccg             10.00T    0.00G   10    25.00T   1%924.33G   24.10T   4%/1%    1024.7T0T/1125.3T 26.72M/28.13M 15T /pool/galaxy

Monitoring Quota Usage

The Linux command quota is working with the NetApp filers (old and new), although not the Isilon.

For example:

Code Block
languagetext
title% quota -s
Disk quotas for user hpc (uid 7235):0T    52M/62M     -  /scratch/nzp_ccg
gpfs01:nmnh_kistlerl             Filesystem  blocks   quota50.00T   limit11.93T   grace38.07T  24%/1% files   quota   limit  - grace 10.61.10.1:/vol_home          -        - 2203M  51200M/scratch/kistlerl
gpfs01:nmnh_meyerc    100G           46433   1800k  25.00T 2000k   0.00G   25.00T   10.61.10.1:/vol_sao0%/1%    24.0T/25.0T    52M/62M     -  /scratch/meyerc
gpfs01:nmnh_ggi   1499G   1946G   2048G           1420k   4000k25.00T   5000k 4.85T   20.15T  20%/1%   10.61.10.1:/vol_scratch/genomics 24.0T/25.0T    52M/62M     -  /scratch/nmnh_ggi
gpfs01:nmnh_lab      48501M   2048G   4096G           25.00T 1263   9000k0.00G  10000k 25.00T   0%/1%     104.61.200.5:/vol/a2v1/genomics010T/5.0T     10M/12M     -  /scratch/nmnh_lab
gpfs01:stri_ap       108M  14336G  15360G             61325.00T  10000k  12000k0.00G   25.00T   0%/1%   10.61.10.1:/vol_home/hydra-2/dingdj  4.0T/5.0T     10M/12M     -  /scratch/stri_ap
gpfs01:sao_atmos           2203M  51200M    100G    186.00T   51.15T  134.85T  4643328%/6%   1800k 98.0T/100T  2000k  252M/261M      

reports your quotas. The -s stands for --human-readable, hence the 'k' and 'G'. While

    % quota -q

will print only information on filesystems where your usage is over the quota. (man quota)

Other Tools

We compile a quota report 4x/day and provide tools to parse the quota report.

...

The string YYDDMM_HH corresponds to the date & hour of the report: "160120_09" for Jan 20 2016 9am report.

The format of this file is not very user friendly and users are listed by their user ID.

 

The Hydra-specific tools, (i.e., requires that you load the tools/local module):

  • show-quotas.pl - show quota values
  • parse-quota-report.pl - parse quota report

Examples

show-quotas.pl - show quota values:

Code Block
languagetext
% show-quotas.pl -u sylvain
Limited to user=sylvain-  /scratch/sao_atmos
gpfs01:sao_cga                        25.00T    8.14T   16.86T  33%/4%     7.0T/8.0T     18M/20M     -  /scratch/sao_cga
gpfs01:sao_tess                ------- quota ------ filesys    50.00T    3.29T   46.71T   7%/4%    36.0T/40.0T  type  94M/210M    - name /scratch/sao_tess
gpfs01:sao_sylvain                    space50.00T    6.63T #files /data/sao:nasm:admin 43.37T  14%/2%    48.0T/50.0T   user115M/128M    -   /scratch/sylvain
gpfs01:nmnh_schultzt                  825.0TB00T  376.87G  40 24.000M63T   2%/home3%         -            - user       sylvain-  /scratch/schultzt
gpfs01:wrbu              100.0GB     2.000M
/pool/sao:nasm        40.00T    3.00T   37.00T   8%/1%    user38.0T/40.0T    99M/100M   sylvain -  /scratch/wrbu
netapp-n1:/vol_pool_admin              3.92T    2.0TB71T     51.000M21T /scratch/genomics:sao:nasm user 70%/5%     5.7T/6.0T   sylvain  14M/15M    10T /pool/admin
netapp-n1:/vol_pool_galaxy          10 400.0TB00G  194.15G  25205.000M85G  49%/pool/sylvain1%    10.7T/11.3T    27M/28M    15T  user/pool/galaxy
gpfs01:admin       sylvain                  30 20.0TB00T    751.000M

Use

   % show-quotas.pl -h

for the complete usage info.

parse-quota-report.pl, will parse the quota report file and produce a more concise report:

No Format
nopaneltrue
% parse-quota-report.pl
Disk quota report: show usage above 75% of quota, (warning when quota > 95%), as of Wed Nov 22 09:00:04 2017.

disks=/data/admin or /data/nasm or /data/sao (volume=vol_data)96T   18.04T  10%/21%        -            -        -  /scratch/admin
gpfs01:bioinformatics_dbs             10.00T  868.14G    9.15T   --9%/1%  disk       --     --  #files     --     default quota:  2.00TB/5M
volume-  /scratch/dbs
nas:/mnt/pool_01/admin               usage 20.00T  %quota  1.67T  usage 18.33T %quota  9%/1%   name, affiliation - username (indiv. quota) -------------------- ------- ------    ------ ------     ------------------------------------------- vol_data       -  /store/admin
nas:/mnt/pool_02/nmnh_bradys    1.88TB  94.0%    40.00T  306.52G   039.01M70T   0.1%/1%     Hotaka Shiokawa, SAO/RG  - hshiokawa  disk=/pool/genomics (volume=vol_genomics)         -        -  /store/bradys
nas:/mnt/pool_02/nmnh_ggi             40.00T   22.09T   17.91T  56%/1% --  disk      --     --  #files     --     default quota:  2.00TB/5M- volume /store/nmnh_ggi
nas:/mnt/pool_03/public             usage 270.00T  %quota 22.55T  247.45T usage  %quota9%/1%     name, affiliation -5T/5T username (indiv. quota) -------------------- ------- ------    ------ ------     -------------------------------------------
vol_genomics  -  /store/public
nas:/mnt/pool_01/sao_atmos          1.84TB  92299.0%97T   68.73T  0231.00M24T   0.23%/1%     H.C. Lim, NMNH/IZ  - limhc vol_genomics          1.58TB-  79.0%     0.15M -  3.0% /store/sao_atmos
nas:/mnt/pool_01/sao_sylvain    Bastian Bentlage, NMNH - bentlageb
vol_genomics 100.00T    8.39T   91.61T  707.4GB 9%/1% 34.5%     4.68M  93.5% -    Molly M. McDonough, CCEG     - mcdonoughm vol_genomics      -  /store/sylvain
 1.52TB  76.0%nas:/mnt/pool_02/nmnh_schultzt        040.00M00T   0 2.1%49T   37.51T  Krzysztof Kozak,7%/1% STRI - kozakk vol_genomics     -     1.70TB  85.0%     0.04M-   0.8%     Logan- Kistler, NMNH/Anthropology - kistlerl
vol_genomicsstore/schultzt
nas:/mnt/pool_02/wrbu            2.00TB 100.0%    40.00T 0 618.00M24G   039.0%40T *** Xu Su, NMNH/Botany - sux

disk=/home (volume=vol_home)2%/1%         -            -        -  /store/wrbu

Monitoring Quota Usage

The Linux command quota is working with the NetApp (/home, /data & /pool), but not on the GPFS (/scratch) or the NAS (/store).

For example:

Code Block
languagetext
title% quota -s
Disk quotas diskfor user hpc -- (uid 7235): 
  --  #files --Filesystem  blocks   default quota: 100.0GB/2M volume limit   grace   files   quota   limit  usage grace
 %quota10.61.10.1:/vol_home
    usage  %quota     name, affiliation - username (indiv. quota) -------------------- ------- ------    ------ ------     -------------------------------------------
vol_home 2203M  51200M    100G           46433   1800k   2000k   80.3GB  80.3%   
10.61.10.1:/vol_sao
    0.27M  13.5%     Tileman Birnstiel, SAO/RG - tbirnstiel vol_home  1499G   1946G   2048G      77.4GB  77.4%   1420k  0.18M 4000k  9.1% 5000k    Rebecca Dikow, NMNH/NZP - dikowr
vol_home
10.61.10.1:/vol_scratch/genomics
             88.3GB  88.3%  48501M   0.01M2048G   0.7%4096G     Gabriela Procópio Camacho, NMNH - procopiocamachog vol_home 1263   9000k  10000k       100.0GB 100.0%
10.61.200.5:/vol/a2v1/genomics01
    0.02M   1.1% *** Logan Kistler, NMNH/Anthropology - kistlerl  disks=/pool/nasm or /pool/sao (volume=vol_sao) 108M  14336G  15360G             613  10000k  --12000k  disk   --   
10.61.10.1:/vol_home/hydra-2/dingdj
    --  #files --     default quota:  2.00TB/5M volume  2203M  51200M    100G       usage   %quota 46433   usage1800k  %quota 2000k    name, affiliation - username (indiv. quota)
--------------------  

reports your quotas. The -s stands for --human-readable, hence the 'k' and 'G'. While

    % quota -q

will print only information on filesystems where your usage is over the quota. (man quota)

(lightbulb)The command quota+ (need to load tools/local) return disk quota for all the disks (see the quota+ section in Additional Tool).

Other Tools

  • We compile a quota report 4x/day and provide tools to parse the quota report.
    • The daily quota report is written around 3:00, 9:00, 15:00, and 21:00
      • in a file called quota_report_YYDDMM_HH.txt, located in /data/sao/hpc/quota-reports/unified/.
    • The string YYDDMM_HH corresponds to the date & hour of the report: "160120_09" for Jan 20 2016 9am report.
    • The format of this file is not very user friendly and users are listed by their user ID.

The Hydra-specific tools, (i.e., requires that you load the tools/local module):

  • quota+ - show quota values
  • parse-disk-quota-reports - parse quota reports

Examples

  • quota+ - show quota values:
Code Block
languagetext
% quota+
Disk quotas for user sylvain (uid 10541):
Muonted on                             Used   Quota   Limit   Grace   Files   Quota   Limit   Grace
------- ------                          ------- -------     ------- ------- ------- ------- ------- --------
vol_sao/home               3.63TB 181.5%     0.19M   3.8% *** Guo-Xin Chen, SAO/SSP-AMP - gchen
vol_                 11.00G  50.00G  100.0G       0  73.13k   2.00M   2.00M       0
/data/sao                             1.54TB92T  77 7.0%60T   8.00T       0  37.55M53M  1178.0%00M  80.00M   Anjali Tripathi, SAO/AST - atripathi
vol_sao   0
/pool/sylvain                  1.66TB       8.79T  8312.0%50T  14.00T       0  57.20M93M  71.00M 4 75.1%00M     Hotaka Shiokawa, SAO/RG - hshiokawa
vol_sao0
/scratch/sao                         10.00G  11.00T  12.00T       0       2  25.00TB17M 100 26.0%21M       0.00M   0.1% *** Chengcai Shen, SAO/SSP - chshen

reports disk usage where it is at 75% above quota.

Or you can check usage for a specific user (like yourself)  with

   % parse-quota-report.pl -u <username>

...


/scratch/sylvain                      6.63T  50.00T  50.00T       0   1.89M  99.61M  104.9M       0
/store/admin                          1.00G    none    none
/store/sylvain                        8.39T    none    none

Use quota+ -h, or read the man page (man quota+), for the complete usage info.

  • parse-disk-quota-reports will parse the disk quota report file and produce a more concise report:
No Format
nopaneltrue
% parse-disk-quota-report.pl -u hpc
reports
Disk quota report: show usage above 85% of quota, (warning when quota > 95%), as of Wed forNov user 'hpc', as of Wed Nov 22 09:00:04 2017.

disks=20 21:00:05 2019.

Volume=NetApp:vol_data_genomics, mounted as /data/genomics
                     --  disk   --     --  #files --     default quota: 512.0GB/1.25M
Disk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
/data/genomics       512.0GB 100.0%     0.17M  13.4% *** Paul Frandsen, OCIO - frandsenp

Volume=NetApp:vol_data_sao, mounted as /data/admin or /data/nasm or /data/sao
(volume=vol_data)                      --  disk   --     --  #files --     default quota:  2.00TB/5M
volumeDisk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
vol_data              43.2GB   2.1%/data/admin:nasm:sao  1.88TB  94.0%     0.01M   0.1%     HPCuid=11599
admin - hpc

disk=/home (volume=vol_home)
Volume=NetApp:vol_home, mounted as /home
                      --  disk   --     --  #files --     default quota: 100.0GB/2M
volumeDisk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
vol_/home               4  96.9GB5GB   4.9%     0.04M   2.0%     HPC admin - hpc

disk=/pool/admin (volume=vol_pool_admin)
  96.5%     0.41M  20.4% *** Roman Kochanov, SAO/AMP - rkochanov
/home                 96.3GB  96.3%     0.12M   6.2% *** Sofia Moschou, SAO/HEA - smoschou
/home                 95.2GB  95.2%     0.11M   5.6% *** Cheryl Lewis Ames, NMNH/IZ - amesc
/home                 95.2GB  95.2%     0.26M  12.8% *** Yanjun (George) Zhou, SAO/SSP - yjzhou
/home                 92.2GB  92.2%     0.80M  40.1%     Taylor Hains, NMNH/VZ - hainst

Volume=NetApp:vol_pool_genomics, mounted as /pool/genomics
                     --  disk   --     --  #files --     default quota:  62.00TB/15M5M
volumeDisk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
vol_pool_admin/pool/genomics       907 1.8GB71TB  1485.8%5%     01.44M23M   2.9%  24.6%     Vanessa Gonzalez, NMNH/LAB - gonzalezv
/pool/genomics        1.70TB  85.0%   HPC admin - hpc

disks= 1.89M  37.8%     Ying Meng, NMNH - mengy
/pool/nasmgenomics or        1.45TB  72.5%     4.56M  91.3%     Brett Gonzalez, NMNH - gonzalezb
/pool/sao (volume=vol_sao)
  genomics       133.9GB   6.5%     4.56M  91.2%     Sarah Lemer, NMNH - lemers

Volume=NetApp:vol_pool_kistlerl, mounted as /pool/kistlerl
                     --  disk   --     --  #files --     default quota:  221.00TB/5M52M
volumeDisk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
vol_sao/pool/kistlerl                0.0MB   0.0%18.35TB  87.4%     0.00M88M   01.1%7%     HPCLogan admin Kistler, NMNH/Anthropology - hpckistlerl

disks=/scratch/genomics or /scratch/nasm or /scratch/sao (volume=vol_scratch)Volume=NetApp:vol_pool_nmnh_ggi, mounted as /pool/nmnh_ggi
                     --  disk   --     --  #files --     default quota: 1015.00TB75TB/25M39M
volumeDisk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
vol_scratch/pool/nmnh_ggi       14.78TB  93.8%  47.4GB   08.5%31M  21.3%   0.00M  Vanessa 0.0%     HPC admin - hpc

disk= (volume=a2v1)Gonzalez, NMNH/LAB - gonzalezv

Volume=NetApp:vol_pool_sao, mounted as /pool/nasm or /pool/sao
                      --  disk   --     --  #files --     default quota: 15 2.00TB/12M5M
volumeDisk                 usage   %quota    usage  %quota     name, affiliation - username (indiv. quota)
-------------------- ------- ------    ------ ------     -------------------------------------------
a2v1/pool/nasm:sao                  78.1MB   01.78TB  89.0%     0.00M16M   03.0%2%     HPC admin - hpc

Use

   % parse-quota-report.pl -h

Guo-Xin Chen, SAO/SSP-AMP - gchen

reports disk usage when it is above 85% of the quota.

Use parse-disk-quota-reports -h, or read the man page (man parse-disk-quota-reports). for the complete usage info.

...

Note

  • Users whose quotas are above the

...

  • 85% threshold will receive a warning email one a week (issued on Monday mornings).
    • This is a warning, as long as you are below 100% you are OK.
    • Users won't be able to write on disks on which they have exceeded their hard limits.

Anchor
NetAppSnapshots
NetAppSnapshots
6. NetApp Snapshots: How to Recover Old or Deleted Files.

...

  • The "-p" will preserve the file creation date and the "-i" will prevent overwriting an existing file. 
  • The "XXXX" is to be replaced by either:
    • hourly.YYYY-MM-DD_HHMM
    • daily.YYYY-MM-DD_0010
    • weekly.YYYY-MM-DD_0015
      where YYY-MM-DD is a date specification (i.e., 2015-11-01)
  • The files under .snapshot are read-only:
    • they be recovered using cp, tar or rsync; but
    • they cannot be moved (mv) or deleted (rm).

How to Use the NAS/ZFS Snapshots:

  • The snapshots on the /store  disks are:
    • located under /store/XXX/.zfs/snapshot (where XXX is, for example, public) and
    • in sub-directories named auto-YYMMDD.0230-8w where YYYYMMDD represent the date of the snapshot.
  • Content of NAS/ZFS snapshots can be recovered as described above. 

Anchor
PublicDisksScrubber
PublicDisksScrubber
7. Public Disks Scrubber

...

       list-scrubbed-dirs [-long|-all] /pool/genomics/frandsenp 160721 [<RE>|-n]

 where the <RE> is an optional regular-expression to limit the printout, w/o an RE your get the complete list, unless you specify -n and you get the number of scrubbed directories.

The -long or -all option allows you to get more info (like age, size and owner)

  • To find out which old files where scrubbed:

       list-scrubbed-files [-long|-all] /pool/genomics/frandsenp 160721 [<RE>|-n]

 where again the <RE> is an optional regular-expression to limit the printout, w/o an RE your get the complete list, unless you specify -n and you get the number of scrubbed files;

 the -long option will produce a list that includes the files' age and size, -all will list age, size and owner.

  • (lightbulb) The <RE> (regular expressions) are PERL-style RE:
    • .     means any char,
    •  .*  means any set of chars,
    • [a-z] means any single character between a and z,
    • ^     means start of match,
    • $     means end of match, etc (see gory details here).
  • for example:

...

  1. create a list with
    list-scrubbed-files /pool/genomics/frandsenp 160721 /pool/genomics/frandsenp/big-project > restore.list
     this will lists all the scrubbed files under 'big-project/' and save the list in restore.list

    (warning) Note that /pool/genomics/frandsenp/big-project means /pool/genomics/frandsenp/big-project*,
    if you want to restrict to /pool/genomics/frandsenp/big-project, add a '/', i.e.: use /pool/genomics/frandsenp/big-project/
     
  2.  edit the file 'restore.list' to trim it, with any text editor (if needed),
     
  3. verify with:
    verify-restore-list /pool/genomics/frandsenp  160721 restore.list
    or use
    verify-restore-list -d /pool/genomics/frandsenp  160721 restore.list
      if the verification produced an error.

  4. Only then, and if the verification produced no error, submit your scrubbed file restoration request as follow:
    • SAO users: email the file(s) or the location of the files to Sylvain at hpc@cfa.harvard.edu
    • non-SAO users: email the file(s) or the location of the files to SI-HPC@si.edu

Anchor
SSDnLocal
SSDnLocal
8. SSD

...

Local Disk Space

...

  • Local SSDs (solid state disks) are available on a few nodes available

...

...

  • from accessing local SSD.
  • How to use the SSD is explained here.


...

Last Updated   SGK/PBF.