We have added SSDs, solid state disks, on a few compute nodes:

(warning) Since we have only a few of them, they should be used only if your application greatly benefit from using an SSD.

(lightbulb) The SSDs are currently only available for the uTSSD.rq queue, a restricted queue. Contact Sylvain or Rebecca if you want to be authorized to use it.


Since you can't access the SSDs from a login node, you must prepare the data the job will need somewhere else, like on /pool (or on /scratch) before submitting a job.

Like for memory, you need to guestimate how much SSD space your job will need. You will not be able to use more SSD space than you requested.

(lightbulb) Remember, your job will still be able to access the /home, /data, /pool, and /scratch disks, hence you don't have to copy everything on the SSD,

only the I/O intensive part of the analysis should use the SSD.

How to Prepare my Data

  1. Create a subdirectory in /pool (or /scratch) and move or copy the data you will need, for example (as user smart1)
    cd /pool/genomics/smart1
    mkdir -p great/project/wild-cat
    Now is have a directory for this case, and would copy the I/O intensive part of the required data set in it.

  2. While not required, you can pack these data in a compressed tar-ball
    cd /pool/genomics/smart1/great/project/wild-cat
    tar cfz ../wild-cat.tgz .
    The file /pool/genomics/smart1/great/project/wild-cat.tgz now holds you input data set, 
    being compressed it is likely to be smaller than the content of /pool/genomics/smart1/great/project/wild-cat,
    That directory can be deleted, unless you will need it later.

  3. Ancillary data and/or configuration files that are not causing intensive I/O can stay on a location under /pool (or /scratch)

How to Adjust a Job Script to Use the SSD

Your jobs script will need the following 4 parts

Part 1:  Copy the Data to the SSD

module load tools/ssd
cp -pR /pool/genomics/smart1/great/project/wild-cat/* $SSD_DIR/.


module load tools/ssd
tar xf /pool/genomics/smart1/great/project/wild-cat.tgz

(lightbulb) The advantage of the compressed tar-ball is that the .tgz file is likely to be smaller than the content of the directory, hence less I/O transfer from the /pool disk, while un-compressing and writing  to the SSD is fast,

Part 2: Adjust the Script or a Configuration File

Part 3: Run the Analysis

Part 4: Copy the Results from the SSD

The drawback being that you need to know how to handle/view/deal with a .tgz file.

A Pseudo Example

Here is what a job script might look like:

#$ -N example
#$ -o example.log -cwd -j y
#$ -q uTxlM.rq
#$ -l ssduse=2560G,hm,mres=20G,h_data=20G,h_vmem=20G
# pseudo example using a fake package WOW, on the SSD
echo $JOB_NAME started `date` on $HOSTNAME in $QUEUE jobID=$JOB_ID
module load tools/ssd
module load special/wow
# create a wow config file from a generic version, to insert the SSD temp dir value
sed "s=XXXX=$SSD_DIR=" ~/wow/wild-cat.gen > ~/wow/wild-cat.conf
# cd to the SSD temp dir and copy the data set to it, using the existing .tgz file
tar xf /pool/genomics/smart1/great/project/wild-cat.tgz
# create some sub dirs for output and logs
mkdir output
mkdir logs
# run the wow analysis (note how some files are not on the SSD)
wow --type=m --params=$HOME/wow/parameters.dat --config=$HOME/wow/wild-cat.conf -o $SSD_DIR/output -l $SSD_DIR/logs
# save the output and the logs in a tar compressed file
# (assumes wow did not change current working directory)
# otherwise insert: cd $SSD_DIR
tar -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz output/ logs/
# remove everything (in $SSD_DIR), or remove what you know you can (conservative option)
rm -rf *
echo $JOB_NAME done `date`

Usage Monitoring Tools

Per Job Basis

% module load tools/local
% module load gnuplot
% -x 7420073

This example plots the SSD usage of job 7420073 to the screen, assuming you have an X-windows capable connection,

Usage Summary

% module load tools/local
% module load gnuplot
% -x

As above:

Last Updated  SGK