Storage Systems (Disks)

The disk space available on the cluster is mounted off two dedicated devices (NetApp and GPFS); a third one (NAS) is not accessible from all the compute nodes.

The NAS is only accessible from the login, interactive and I/O nodes (hence it is a near-line storage system).

The available public disk space is divided in several area (aka partitions):

a small partition for basic configuration files and small storage, the /home partition,
a set of medium size partitions, the /data partitions,
a set of large partitions, the /pool partitions,
a set of very large partitions, the /scratch partitions,
a set of medium size, low-cost, partitions, the /store partitions.

It should be used as follows:

Name	Typical Use	System	Size^(*)
`/home`	For your basic configuration files, scripts and job files: low quota limit but you can easily recover old stuff, backup to AWS Glacier for disaster recovery (DR)	NetApp	`40TB`
`/data/{sao\|genomics}` `/data/{biology\|nasm}` `/data/{fellows\|data_science}`	For important but small files like final results, config files, etc medium quota limit, you can easily recover old stuff, but when deleting files disk space is not released right away. we plan to backup to AWS Glacier for DR	NetApp	`50TB` 5TB 5TB
`/pool/{sao\|genomics}` `/pool/{biology\|nasm}` `/pool/{fellows\|data_science}`	For the bulk of your storage high quota limit, and disk space is released right away when deleting files.	NetApp	`120TB` `5TB` `5TB`
`/scratch/genomics` `/scratch/sao` `/pool/fellows` `/pool/{biology\|nasm\|data_science}`	For the bulk of your large storage faster storage, high quota limit, and disk space is released right away when deleting files.	GPFS	`300TB` `140TB` `30TB` `5TB`
`/store/public`	For near-line storage.	NAS	175TB

^(*): These sizes are only indicative, as we adjust them when needed.

Note

We impose quotas (limit on how much can be stored on each partition by each user) and we monitor disk usage;
- /home should not be used for storage of large files, use /pool or /scratch instead;
- /data is best to store things like final results, code, etc.. (important but not too large);
We implement an automatic scrubber: old stuff get deleted to make space,
- files older than 180 days and empty directories on /pool or /scratch will scrubbed.

None of the disks on the cluster are for long term storage:
- please copy your results back to your "home" computer and
- delete what you don't need any longer.
Once you reach your quota you won't be able to write anything on that partition until you delete stuff.
A few compute nodes have local SSDs (solid state disks), but since we now have a GPFS, check things using /scratch first.

A complete description is available at the Disk Space Configuration page.

Last updated 20 Nov 2021 SGK

Page tree

Storage Systems (Disks)

Note