- Introduction: What Disks to Use
- Disks Space Configuration
- How to Check Disk & Quota Usage
- How to Copy Files to/from Hydra
- How to Recover Old or Deleted Files using Snapshots
- Public Disks Scrubber and How to Request Scrubbed Files to be Restored
- How to Use Local SSD Space
- How to Use NAS Storage and the I/O Queue
1. Introduction: What Disks to Use
The disk space available on the cluster is mounted off a set of dedicated devices:
- A NetApp filer, via NFS,
- A two NSD GPFS, via the Infiniband fabric,
- A low cost NAS, via NFS on only a subset of nodes.
The available disk space is divided in several area (aka volumes, filesets or partitions):
- a small partition for basic configuration files and small storage, the
/home
partition,
- a set of medium size partitions, the
/data
partitions, - a set of large partitions, the
/pool
partitions, - a set of very large partitions for temporary storage, the
/scratch
partitions, - a set of medium, size low-cost, partitions, the
/store
partitions
The public /pool
and /scratch
partition are scrubbed: files older that 180 days are automatically removed
- Consult the Scrubber and How to Request Scrubbed Files to be Restored page for more information.
SSD
A subset of nodes have local SSDs (solid state disks) that can be used for applications that require very high I/O rates, and will complete faster when using SSDs.
- Jobs that do not perform intensive I/O should not use the SSDs - this is a scarce shared resource.
These disks are local to the compute nodes, hence:
- you cannot see the SSDs from either login nodes,
- your job will be able to use the SSD only while the job is running, hence your job need to be adjusted accordingly and request SSD space.
- If your job exceeds the amount of SSD space requested, your job won't be able to write any longer to the SSD,
- consult the How to Use Local SSD Space page for more information.
Remember
- We impose quotas:
- limits on how much can be stored on each disk (partition/volume/fileset) by each user, and
- we monitor disk usage;
/home
should not be used to keep large files, use/pool, /scratch,
or,/data
instead;/
andpool
are for active temporary storage (i.e., while a job is running)./scratch
- If your job(s) need a lot of disk space or your job(s) perform a lot of I/Os, use
/
rather thanscratch
./pool
- Public space on both partitions (
/pool
and/scratch
) are scrubbed: old stuff is deleted to make sure there is space for active users.
- Public space on both partitions (
- None of the disks on the cluster are for long term storage:
- please copy your results back to your home computer and
- delete what you don't need any longer.
- While the disk systems on
Hydra
are highly reliable, most of the disks on the cluster are not backed up, although:- some partitions have snapshots enabled: this allows you to 'undelete' files that were recently deleted (see How to Recover Old or Deleted Files using Snapshots)
/home
and/data
are backed up to AWS Glacier for disaster recovery (DR).
- Once you reach your quota you won't be able to write anything on that partition until you delete stuff.
- Do not keep a very large number of files in the same directory:
- best practice is to keep less then 5,000 - 50,000 files in the same directory.
- If you keep too many of them in the same directory:
- you may not be able to write more files,
- listing the content of such directory will be exceedingly slow.
- What to do instead?
- Use subdirectories to better organize your files (and your work).
Last Updated SGK/PBF.