1. Introduction: What Disks to Use
  2. Disks Space Configuration
  3. How to Check Disk & Quota Usage
  4. How to Copy Files to/from Hydra
  5. How to Recover Old or Deleted Files using Snapshots
  6. Public Disks Scrubber and How to Request Scrubbed Files to be Restored
  7. How to Use Local SSD Space
  8. How to Use NAS Storage and the I/O Queue
  9. How to Use "bigtmp" - Access to Large Temporary Disk Space

1. Introduction: What Disks to Use

The disk space available on the cluster is mounted off a set of dedicated devices:

  1. A NetApp filer, via NFS,
  2. A two NSD GPFS, via the Infiniband fabric,
  3. A low cost NAS, via NFS on only a subset of nodes.

The available disk space is divided in several area (aka volumes, filesets or partitions):

  • a small partition for basic configuration files and small storage, the /home partition,
  • a set of medium size partitions, the /data partitions,
  • a set of large partitions, the /pool partitions,
  • a set of very large partitions for temporary storage, the /scratch partitions,
  • a set of medium, size low-cost, partitions, the /store partitions

(warning) The public /pool  and /scratch partition are scrubbed: files older that 180 days are automatically removed (warning)

SSD

A subset of nodes have local SSDs (solid state disks) that can be used for applications that require very high I/O rates, and will complete faster when using SSDs.

  • Jobs that do not perform intensive I/O should not use the SSDs - this is a scarce shared resource.

These disks are local to the compute nodes, hence: 

  • you cannot see the SSDs from either login nodes,
  • your job will be able to use the SSD only while the job is running, hence your job need to be adjusted accordingly and request SSD space.
  • If your job exceeds the amount of SSD space requested, your job won't be able to write any longer to the SSD,
  • consult the How to Use Local SSD Space page for more information.

Remember

  • We impose quotas:
    • limits on how much can be stored on each disk (partition/volume/fileset) by each user, and
    • we monitor disk usage;
  • /home should not be used to keep large files, use /pool, /scratch, or, /data instead;
  • /pool and /scratch are for active temporary storage (i.e., while a job is running).
  • If your job(s) need a lot of disk space or your job(s) perform a lot of I/Os,  use /scratch rather than /pool.
    • Public space on both partitions (/pool and /scratch) are scrubbed: old stuff is deleted to make sure there is space for active users.
  • None of the disks on the cluster are for long term storage:
    • please copy your results back to your home computer and
    • delete what you don't need any longer.
  • While the disk systems on Hydra are highly reliable, most of the disks on the cluster are not backed up, although:
  • Once you reach your quota you won't be able to write anything on that partition until you delete stuff.
  • (warning) Do not keep a very large number of files in the same directory:
    • (lightbulb)best practice is to keep less then 5,000 - 50,000 files in the same directory.
  • If you keep too many of them in the same directory:
    • you may not be able to write more files,
    • listing the content of such directory will be exceedingly slow.
  •  What to do instead?
    • Use subdirectories to better organize your files (and your work).

Last Updated  SGK/PBF.

  • No labels