1. What Disks to Use

1.a Where to Store my Stuff

1.b Disk Configuration

All the disk space available on the cluster is mounted off a dedicated device (aka appliance or server), a NetApp filer.

The current disk configuration is as follow:

	Maximum	Quotas per user		NetApp
	disk	disk space	`no. of files`	snapshots
Disk name	capacity	soft/hard	soft/hard	enabled?	What disk shall I use?
`/home`	`8TB`	50/`100GB`	`1.8/2M`	`yes: 4 weeks`	For your basic configuration files, scripts and job files - your limit is low but you can recover old stuff up to 4 weeks.
/`pool/sao`	`60TB`	`1.8/2.0TB`	`4/5M`	`no`	For the bulk of your storage - your limit is high, and disk space is released right away, for SAO users.
`/pool/genomics`	`50TB`	`1.8/2.0TB`	`1.8/2M`	`no`	For the bulk of your storage - your limit is high, and disk space is released right away, for non-SAO users.
`/data/sao`	`20TB`	`2.8/3.0TB`	`1/2M`	`yes: 2 weeks`	For important but relatively small files like final results, etc. - your limit is medium, you can recover old stuff, but disk space is not released right away. For SAO users
`/data/genomics`	`10TB`	`1.0/2.0TB`	`1/2M`	`yes: 2 weeks`	For important but relatively small files like final results, etc. - your limit is medium, you can recover old stuff, but disk space is not released right away. For non-SAO users.
`/scratch`	`50TB`	`2.8/3.0TB`	`1/1M`	`no, FIFO model`	If you need more than what you can keep in `/pool` - SAO/non-SAO user should use `/scratch/sao` or `/scratch/genomics`, respectively. The FIFO model (first in first out) purging has yet to be implemented as we tune the system.

Notes

The sizes of the file systems (aka the disks) on the NetApp will "auto-grow" until they reach the listed maximum capacity, so the size shown by the command df does always not reflect the maximum size.

To prevent the disks to fill up and hose the cluster:

disk usage is limited to:
- the amount of disk space listed under quota per user, and,
- the number of files and directories listed under no. of files (in fact "inodes": the sum of number of files and number of directories).
exceeding the soft limit produces warnings; while
the hard limit cannot be exceeded, producing errors.

The Linux command quota is not (yet) working with the NetApp filer. We compile a daily quota report and provide tools to query the quotas and parse the quota report. (need to insert links to these tools)

Once we secure more space for /scratch, we will implement a FIFO (first in first out) model, where old files are deleted without warning to make space.

There will be a minimum age limit, meaning that only files older that (let's say) 3 months will be deleted.
We will try to keep /scratch from filling up by running a scrubber regularly.

2. Disk Quotas

3. NetApp Snapshots: How to Recover Old or Deleted Files.

Some of the disks on the NetApp filer have the so called "snapshot mechanism" enabled:

This allow users to recover deleted files or access an older version of a file.
Indeed, the NetApp filer makes a "snapshot" copy of the file system (the content of the disk) every so often and keeps these snapshots up to a given age.
So if we enable hourly snapshot and set a two weeks retention, you can recover a file as it was hours ago, days ago or weeks ago, but only up to two weeks ago.
The drawback of the snapshot is that when files are deleted, the disk space is not freed until the deleted files age-out. like 2 or 4 weeks later.

How to Use the NetApp Snapshots:

To recover an old version or a deleted file, foo.dat, that was (for example) in /data/genomics/frandsen/important/results/:

If the file was deleted:

 cd /data/genomics/.snapshot/XXXX/frandsen/important/results
 cp -pi foo.dat /data/genomics/frandsen/important/results/foo.dat

If you want to recover an old version:

 cd /data/genomics/.snapshot/XXXX/frandsen/important/results
 cp -pi foo.dat /data/genomics/frandsen/important/results/old-foo.dat

The -p will preserve the file creation date and the -i will prevent overwriting an existing file.
The XXXX is to be replaced by either:
- hourly.YYYY-MM-DD_HHMM
- daily.YYYY-MM-DD_0010
- weekly.YYYY-MM-DD_0015
  where YYY-MM-DD is a date specification (i.e., 2015-11-01)
The files under .snapshot are read-only:
- they be recovered using cp, tar or rsync; but
- they cannot be moved (mv) or deleted (rm).

4. Disk Usage Monitoring

The following tools can be used to monitor disk usage

(more to come)

5. Local Disk and SSDs

(more to come)

Last Updated 19 Jan 2016 SGK.