Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • we impose quotas: limits on how much can be stored on each partition by each user, and
  • we monitor disk usage;
  • /home should not be used to keep large files, use /pool instead;
  • /pool is for active temporary storage (i.e., while a job is running), it can be used if .
    • If you need even more disk space
    than what you can store under 
    • , ask to be allowed to use /scratch.
    • Both partitions (/pool and /scratch) are scrubbed (see below): old stuff is deleted to make sure there is space for active users.
  • None of the disks on the cluster are for long term storage, please copy your results back to your "home" computer and
    delete what you don't need any longer.
  • While the disk system on Hydra is highly reliable, none of the disks on the cluster are backed up.
  • Once you reach your quota you won't be able to write anything on that partition until you delete stuff.
  • A few nodes have local SSDs (solid state disks), and
    for special cases it may be OK to use disk space local to the compute node.

    Contact us if your jobs can benefit from more disk space, SSDs or local disk space.

...

Alternatively you can use a GUI based ssh/scp compatible tool like FileZilla. Note, Cyberduck is not recommended because it uses a lot of CPU cycles on Hydra.

You will still most likely need to run VPN.

...

Alternatively you can use a GUI based ssh/scp compatible tool like FileZilla,  or WinSCP or Cyberduck.. Note, Cyberduck is not recommended because it uses a lot of CPU cycles on Hydra.

You will still most likely need to run VPN.

...

Anchor
DiskConfig
DiskConfig
4. Disk Configuration

 



 MaximumQuotas per userNetApp 

 diskdisk space

no. of files

snapshots 

Disk name

capacity

soft/hard

soft/hard

enabled?

Purpose

/home

10TB

50/100GB

1.8/2M

yes: 4 weeks

For your basic configuration files, scripts and job files

- your limit is low but you can recover old stuff up to 4 weeks.

/data/sao

or

/data/nasm

40TB*

1.9/2.0TB

4.75/5M

yes: 2 weeks

 For important but relatively small files like final results, etc.

- your limit is medium, you can recover old stuff, but disk space is not released right away.

For SAO or NASM users.

/data/genomics

30TB*

0.45/0.5TB1.19/1.25M

yes: 2 weeks

For important but relatively small files like final results, etc.

- your limit is medium, you can recover old stuff, but disk space is not released right away.

For non-SAO/NASM users.

/pool/sao

or

/pool/nasm

45TB37TB

1.9/2.0TB

4/5M

no

For the bulk of your storage

- your limit is high, and disk space is released right away, for SAO or NASM users.

/pool/genomics

50TB

1.9/2.0TB4.75/5M

no

For the bulk of your storage

- your limit is high, and disk space is released right away, for non-SAO users.

/pool/biology

7TB

1.9/2.0TB4.75/5M

no

For the bulk of your storage

- your limit is high, and disk space is released right away, for non-SAO/NASM users.

/scratch

100TB

9.5/10.0TB

23.75/25M

no

For temporary storage, if you need more than what you can keep in /pool

- SAO, NASM or non-SAO/NASM users should use

 /scratch/sao, /scratch/nasm or /scratch/genomics, respectively

/scratch/genomics01:sao0150TB14/15TB2/2MnoAdditional temporary storage, on old (slow) disks, that will eventually be retired


    
 Project specific disks
/pool/nmnh_ggi21TB15.0/15.8T37.4/39.4MnoNMHN/GGI/pool/kistlerl21TB20.0/21.0T49.9/52.5MnoNMNH/Logan KKistler
/pool/kozakk11TB10.5/11.0T26.1/27.5MnoSTRI/Krzysztof KKozak
/pool/nmnh_ggi21TB15.0/15.8T37.4/39.4MnoNMHN/GGI
/pool/sao_access21TB15.0/15.8TB37.4/39.4MnoSAO/ACCESS
/pool/sao_atmos36TB

 8.0/10TB

9/10MnoSAO/ATMOS
/pool/sao_rtdc10TB*2.8/3.0TB2.5/3.0MnoSAO/RTDC
/pool/sao_accesscga21TB8TB157.09/15.8TB37.4/39.4M20/19MnoSAO/ACCESSCGA
/pool/sylvain15TB14/15TB 63/65Mno

SAO/Sylvain KKorzennik






Extra
/pool/admin10TB*5.7/6.0TB

14.3/15.0M

noSys Admin
/pool/galaxy15TB*10.7/11.3TB26.7/28.1MnoGalaxy

*: maximum size, disk size will increase up to that value if/when usage grows

(as of Nov 15May 1, 20172018)

Notes

  • The notation
    • 1.8/2.0TB means that the soft limit is 1.8TB and the hard limit is 2.0TB of disk space, while
    • 4/5M means that the soft limit is 4 million inodes and the hard limit is 5 million.

  • It is inefficient to store a slew of small files and if you do you may reach your inodes quota before your space quota (too many small files).
    •  Some of the disk monitoring tools show the inode usage.
    • If your %(inode)>%(space) your disk usage is inefficient,
      consider archiving your files into zip or tar-compressed sets.

  • While some of the tool(s) you use may force you to be inefficient while jobs are running, you should remember to
    • remove useless files when jobs have completed,
    • compress files that can benefit from compression (with gzip, bzip2 or compress), and
    • archive a slew of files into a zip or a tar-compressed set, as follows:
         % zip archive.zip dir/
      or
         % tar -czf archive.tgz dir/
      both examples archive the content of the directory dir/ into a single zip or a tgz file. You can then delete the content of dir/ with
         % rm -rf dir/
  • You can unpack each type of archive with
       % unzip archive.zip
    or
       % tar xf archive.tgz

  • The sizes of  some of the partitions (aka the various disks) on the NetApp will "auto-grow" until they reach the  listed maximum capacity,
    so the size shown by the traditional Un*x command, like df does not necessarily reflect the maximum size.

    We have implement a FIFO (first in first out) model, where old files are deleted  to make space, aka scrubbed.
    • There is an age limit, meaning that only files older than 180 days (or 90 days) get deleted.
    • Older files get deleted before the newer ones (FIFO),
    • We run a scrubber an a regular interval.
  • In any case, we ask you to remove from /pool and /scratch files that you do not need for active jobs.

  • For projects that want dedicated disk space, such space can be secured with project's specific funds when we expand the disk farm (contact us).

...

No Format
nopaneltrue
% du -sh dir/
136M    dir/

 


The output of df can be very long and confusing.

...

No Format
nopaneltrue
% df -h /pool/sao
Filesystem           Size  Used Avail Use% Mounted on
10.61.10.1:/vol_sao   20T   15T  5.1T  75% /pool/sao

 


You can compile the output of du into a more useful report with the dus-report.pl tool. This tool will run du for you (can take a while) and parse its output to produce a more concise/useful report.

...

to see how else you can use it.

 


The tool disk-usage.pl runs df and presents its output in a more friendly format:

...

The format of this file is not very user friendly and users are listed by their user ID.

 


The Hydra-specific tools, (i.e., requires that you load the tools/local module):

...

for the complete usage info.

 


Users whose quotas are above the 75% threshold will receive a warning email one a week (issued on Monday mornings).

...

       list-scrubbed-dirs [-long|-all] /pool/genomics/frandsenp 160721 [<RE>|-n]

 where the <RE> is an optional regular-expression to limit the printout, w/o an RE your get the complete list, unless you specify -n and you get the number of scrubbed directories.

The -long or -all option allows you to get more info (like age, size and owner)

  • To find out which old files where scrubbed:

       list-scrubbed-files [-long|-all] /pool/genomics/frandsenp 160721 [<RE>|-n]

 where again the <RE> is an optional regular-expression to limit the printout, w/o an RE your get the complete list, unless you specify -n and you get the number of scrubbed files;

 the -long option will produce a list that includes the files' age and size, -all will list age, size and owner.

  • (lightbulb) The <RE> (regular expressions) are PERL-style RE:
    • .     means any char,
    •  .*  means any set of chars,
    • [a-z] means any single character between a and z,
    • ^     means start of match,
    • $     means end of match, etc (see gory details here).
  • for example:

...

We are in the process of making the local SSDs (solid state disks) available on a few nodes available, and
for special cases it may be OK to use disk space local to the compute node.

Until we post here detailed instructions, you You should contact us if your jobs can benefit fron from either SSDs or local disk space.

Last Updated SGKHow to use the SSD is explained here.

...

Last Updated  SGK/PBF.