- Introduction - What is Scrubbing
- What Disks Are Scrubbed
- How to Access the Scrubber Tools
- How to Check what Will be Scrubbed
- How to Find out what Was Scrubbed
- How to Request Scrubbed Files to be Restored
1.Introduction - What is Scrubbing?
In order to maintain free disk space on the public disks, we use a disk scrubber to remove old files and old empty directories.
The scrubber is run on a weekly basis, it deletes old empty directories, but old files are, at first, moved away in a staging location, then permanently deleted some 10 days later.
Please Note
Since the scrubber moves old files away at first, and delete them later,
- there is a grace period between the scrubbing (move) and the permanent deletion to allow users to request for some scrubbed files to be restored;
- reasonable requests to restore scrubbed files must be be sent no later than the Friday following the scrubbing, by 5pm;
- scrubbed files still "count" against the user quota until they are permanently deleted;
- one permanently deleted (aka zapped) the files can no longer be restored.
Requests to restore scrubbed file should be
- rare,
- reasonable (i.e. no blanket request), and,
- can only be granted while the scrubbed files are not yet permanently deleted.
Past the grace period, the files are no longer available, hence users who want their scrubbed files restore have to act promptly.
2. What Disks Are Scrubbed
The disks that are scrubbed are:
/pool/public/{biology|genomics|nasm|sao} - 180 days old files/empty directories
/scratch/public/{biology|genomics|nasm|sao}- 180 days old files/empty directories
/scratch/genomics
is the same as/pool/public/genomics
, etc.
3. How to Access the Scrubber's Tools
To access the scrubber tools, you need to load the module:
module load tools/scrubber
- to get the list of tools, use:
module help tools/scrubber
- to get the man page, accessible after loading the module, use:
man <tool-name>
4. How to Check what Will Be Scrubbed
- To check what files will be scrubbed, use:
find-scrub [-in <dir>] [-age <age>]
this will look for files older than <age> days in <dir>, by default dir=current working directory, and age=173 or 83 days.
- This search taxes the file system (aka disk server), especially if you have a lot of files, so use as needed only.
5. How to Find Out what Was Scrubbed
You will receive an email if any of your files were scrubbed.
- To look at the report for what was scrubbed on Jul 21 2016 under
/pool/public/genomics/frandsenp
:
show-scrubber-report /pool/public/genomics/frandsenp 160721
- To find out which old empty directories where scrubbed:
list-scrubbed-dirs [-long|-all] /pool/public/genomics/frandsenp 160721 [<RE>|-n]
where the <RE> is an optional regular-expression to limit the printout, w/o an RE your get the complete list, unless you specify -n and you get the number of scrubbed directories.
The -long or -all option allows you to get more info (like age, size and owner)
- To find out which old files where scrubbed:
list-scrubbed-files [-long|-all] /pool/public/genomics/frandsenp 160721 [<RE>|-n]
where again the <RE> is an optional regular-expression to limit the printout, w/o an RE your get the complete list, unless you specify -n and you get the number of scrubbed files;
the -long
option will produce a list that includes the files' age and size, -all will list age, size and owner.
The <RE> (regular expressions) are PERL-style RE:
.
means any char,-
.*
means any set of chars, [a-z]
means any single character betweena
andz,
^
means start of match,$
means end of match, etc (see gory details here).
- for example:
'^/pool/public/genomics/blah/project/.*\.log$'
means all the files that end in '.log'
under '/pool/public/genomics/blah/project/'
6. How to Request Scrubbed Files to be Restored
In order to request that some of your scrubbed files be restored, you need to create a list of files, trimmed it down to what you really need, and verify that list.
We do not accept bulk restore requests.
To produce the list of files to restore (that in this example were under /pool/public/genomics/frandsenp/big-project
), follow these steps:
- Load the scrubber module (under tools):
module load tools/scrubber
- Create a list (use the appropriate path):
list-scrubbed-files /pool/public/genomics/frandsenp 240721 > restore.list
to get a list of all the files scrubbed on Sunday Jul 21 2024, or for example:list-scrubbed-files /pool/public/genomics/frandsenp 240721 /pool/public/genomics/frandsenp/big-project/ > restore.list
to get a list all the scrubbed files under'big-project/
' and in both cases save the list in the file 'restore.list' (
in the current working directory).Note that
/pool/public/genomics/frandsenp/big-project
means/pool/public/genomics/frandsenp/big-project*
, not/pool/public/genomics/frandsenp/big-project/
- Edit the file '
restore.lis
t' to trim it down to what you really need, with any text editor (likevi, nano, emacs
, etc);
- Verify the '
restore.list
' file:verify-restore-list /pool/public/genomics/frandsenp 240721 restore.list
or, to get more infoverify-restore-list -d /pool/public/genomics/frandsenp 240721 restore.list
if the verification produced an error, edit the file accordingly.- You need a separate restore file per scrubbed date and per disk (i.e.,
/pool
vs/scratch
)
- You need a separate restore file per scrubbed date and per disk (i.e.,
- Only then, and if the verification produced no error, submit your scrubbed file restoration request as follow:
- SAO users: email the location of the list file(s) to hpc@cfa.harvard.edu
- non-SAO users: email the location of the list file(s) to SI-HPC@si.edu
- While you can email the list file(s) themselves, it is more a lots more convenient for us if these list files are already somewhere on Hydra.
- While you can email the list file(s) themselves, it is more a lots more convenient for us if these list files are already somewhere on Hydra.
You can also consult the man pages for the list-scrubbed-files
and verify-restore-list
commands, as follows:
module load tools/scrubber
man list-scrubbed-files
man verify-restore-list
The restored files will not be scrubbed for another 180 days:
- check with
ls -lc filename
orstat filename
on a restored file, it is thectime
(change time) that matters.
- check with
Last Updated SGK/PBF