Disks Space and Usage

1. Introduction: What Disks to Use

The disk space available on the cluster is mounted off a set of dedicated devices:

A NetApp filer, via NFS,
A two NSD GPFS, via the Infiniband fabric,
A low cost NAS, via NFS on only a subset of nodes.

The available disk space is divided in several area (aka volumes, filesets or partitions):

a small partition for basic configuration files and small storage, the /home partition,
a set of medium size partitions, the /data partitions,
a set of large partitions, the /pool partitions,
a set of very large partitions for temporary storage, the /scratch partitions,
a set of medium, size low-cost, partitions, the /store partitions

The public /pool and /scratch partition are scrubbed: files older that 180 days are automatically removed

Consult the Scrubber and How to Request Scrubbed Files to be Restored page for more information.

SSD

A subset of nodes have local SSDs (solid state disks) that can be used for applications that require very high I/O rates, and will complete faster when using SSDs.

Jobs that do not perform intensive I/O should not use the SSDs - this is a scarce shared resource.

These disks are local to the compute nodes, hence:

you cannot see the SSDs from either login nodes,
your job will be able to use the SSD only while the job is running, hence your job need to be adjusted accordingly and request SSD space.
If your job exceeds the amount of SSD space requested, your job won't be able to write any longer to the SSD,
consult the How to Use Local SSD Space page for more information.

Remember

We impose quotas:
- limits on how much can be stored on each disk (partition/volume/fileset) by each user, and
- we monitor disk usage;
/home should not be used to keep large files, use /pool, /scratch, or, /data instead;
/pool and /scratch are for active temporary storage (i.e., while a job is running).
If your job(s) need a lot of disk space or your job(s) perform a lot of I/Os, use /scratch rather than /pool.
- Public space on both partitions (/pool and /scratch) are scrubbed: old stuff is deleted to make sure there is space for active users.
None of the disks on the cluster are for long term storage:
- please copy your results back to your home computer and
- delete what you don't need any longer.
While the disk systems on Hydra are highly reliable, most of the disks on the cluster are not backed up, although:
- some partitions have snapshots enabled: this allows you to 'undelete' files that were recently deleted (see How to Recover Old or Deleted Files using Snapshots)
- /home and /data are backed up to AWS Glacier for disaster recovery (DR).
Once you reach your quota you won't be able to write anything on that partition until you delete stuff.
Do not keep a very large number of files in the same directory:
- best practice is to keep less then 5,000 - 50,000 files in the same directory.
If you keep too many of them in the same directory:
- you may not be able to write more files,
- listing the content of such directory will be exceedingly slow.
What to do instead?
- Use subdirectories to better organize your files (and your work).

Last Updated 19 Jan 2024 SGK/PBF.

Page tree

Disks Space and Usage

1. Introduction: What Disks to Use

SSD

Remember