We have added SSDs, solid state disks, on a few compute nodes:
request the right amount of SSD disk space:
this is the maximum amount of disk space your job will need on the SSD at run-time, (similarly to the maximum of memory it will need).
Since we have only a few of them, they should be used only if your application greatly benefit from using an SSD.
The SSDs are currently only available for the
uTSSD.rq queue, a restricted queue. Contact Sylvain or Rebecca if you want to be authorized to use it.
Since you can't access the SSDs from a login node, you must prepare the data the job will need somewhere else, like on
/pool (or on
/scratch) before submitting a job.
Like for memory, you need to guestimate how much SSD space your job will need. You will not be able to use more SSD space than you requested.
Remember, your job will still be able to access the
/home, /data, /pool, and
/scratch disks, hence you don't have to copy everything on the SSD,
only the I/O intensive part of the analysis should use the SSD.
/scratch)and move or copy the data you will need, for example (as user smart1)
mkdir -p great/project/wild-cat
tar cfz ../wild-cat.tgz .
/pool/genomics/smart1/great/project/wild-cat.tgznow holds you input data set,
Your jobs script will need the following 4 parts
tools/ssdmodule and copy or extract your data set as follows:
module load tools/ssd cp -pR /pool/genomics/smart1/great/project/wild-cat/* $SSD_DIR/.
module load tools/ssd cd $SSD_DIR tar xf /pool/genomics/smart1/great/project/wild-cat.tgz
The advantage of the compressed tar-ball is that the
.tgz file is likely to be smaller than the content of the directory, hence less I/O transfer from the
/pool disk, while un-compressing and writing to the SSD is fast,
execute -o /pool/genomics/smart1/great/project/wild-cat/result.dat
execute -o $SSD_DIR/result.dat
Let's assume that your analysis uses a file
wow.conf, where for instance the full path of some files must be listed, like:
# this is the configuration file of the fabulous WOW package input=/pool/genomics/smart1/great/project/wild-cat/wiskers.dat output=/pool/genomics/smart1/great/project/wild-cat/tail.dat paws=4 eyes=2
wow.conf file by a
wow.gen file as follows:
# this is the configuration file of the fabulous WOW package input=XXXX/wiskers.dat output=XXXX/wild-cat/tail.dat paws=4 eyes=2
And create the
wow.conf file from the
wow.gen at run-time by adding the following in the job script:
sed "s=XXXX=$SSD_DIR=" wow.gen > wow.conf
As long as
XXXX is not used for anything else, this will replace every occurrence of
XXXX by the value of the environment variable
At the end of the job script, you must add instructions to copy the results of your analysis back to
If/when the results are easily identifiable, you can use
the commands mv
or tar, and
find, here are a few examples:
Move the directory where all the results are stored and the log file, delete the rest.
# move results and log file back cd $SSD_DIR mv results /pool/genomics/smart1/great/project/wild-cat/. mv wow.log /pool/genomics/smart1/great/project/wild-cat/. # # delete the rest rm -rf *
Move the directory where all the results are stored and the log file, delete the input (conservative approach, in case you missed something).
# move results and log file back cd $SSD_DIR mv results /pool/genomics/smart1/great/project/wild-cat/. mv wow.log /pool/genomics/smart1/great/project/wild-cat/. # # delete input set and other stuff rm -rf input rm wow.gen wow.conf
Move using the
--update flag of
# move results using --update cd $SSD_DIR mv --update * /pool/genomics/smart1/great/project/wild-cat/. # # delete the rest rm -rf *
Note, you can use
mv --update on an explicit list (of files, directories, or file specification), not just * (everything), and you do not have to remove the rest, but can only remove what you know you can safely remove (conservative approach).
Find newer files and move them: the trick is to create a 'timestamp' file before starting the analysis.
That file can be used later to find any newer file with the
--newer= option of
# set the timestamp date > $SSD_DIR/started.txt # run the analysis ... # copy the new files in the subdir data/ to a compressed tar-ball cd $SSD_DIR tar --newer=$SSD_DIR/started.txt -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz data/ # now remove it rm -rf data/ # etc... # delete everything, unless rm -rf *
See previous comments and what to tar and what to remove: once you've
tar'd new stuff in
Using the timestamp file and the
find command (see
# set the timestamp date > $SSD_DIR/started.txt # run the analysis ... # find the new files in the subdir data/ cd $SSD_DIR find data/ -newer $SSD_DIR/started.txt -type f > /tmp/list # do the same on logs/, append to the list find logs/ -newer $SSD_DIR/started.txt -type f >> /tmp/list # etc... # now save what is in the list with one tar tar --files-from=/tmp/list -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz data/ # now remove data/ and logs/ rm -rf data/ logs/ # etc... # delete everything, unless rm -rf *
There are many more ways to accomplish this ....
BTW, the advantage of writing a
.tgz file, rather than moving files is two fold, assuming your stuff is compressible:
You write less in the .tgz file, so it should be done faster (reading and compressing should be fast, writing is the slow step)
you need less disk space for your output (since it is compressed).
The drawback being that you need to know how to handle/view/deal with a
Here is what a job script might look like:
# #$ -N example #$ -o example.log -cwd -j y #$ -q uTxlM.rq #$ -l ssduse=2560G,hm,mres=20G,h_data=20G,h_vmem=20G # # pseudo example using a fake package WOW, on the SSD # echo $JOB_NAME started `date` on $HOSTNAME in $QUEUE jobID=$JOB_ID # module load tools/ssd module load special/wow # # create a wow config file from a generic version, to insert the SSD temp dir value sed "s=XXXX=$SSD_DIR=" ~/wow/wild-cat.gen > ~/wow/wild-cat.conf # # cd to the SSD temp dir and copy the data set to it, using the existing .tgz file cd $SSD_DIR tar xf /pool/genomics/smart1/great/project/wild-cat.tgz # # create some sub dirs for output and logs mkdir output mkdir logs # # run the wow analysis (note how some files are not on the SSD) wow --type=m --params=$HOME/wow/parameters.dat --config=$HOME/wow/wild-cat.conf -o $SSD_DIR/output -l $SSD_DIR/logs # # save the output and the logs in a tar compressed file # (assumes wow did not change current working directory) # otherwise insert: cd $SSD_DIR tar -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz output/ logs/ # # remove everything (in $SSD_DIR), or remove what you know you can (conservative option) rm -rf * # echo $JOB_NAME done `date`
% module load tools/local % module load gnuplot % plot-qssduse.pl -x 7420073
This example plots the SSD usage of job 7420073 to the screen, assuming you have an X-windows capable connection,
-xto plot to a file,
NNNN.TTT, instead of
NNNNto show usage for a given task (
TTT) of a job array (
man plot-qssduse.plfor more information.
% module load tools/local % module load gnuplot % plot-qssduse-summary.pl -x
-xto plot to a file.
man plot-qssduse-summary.plfor more information.
Last Updated SGK