Page History

...

You need to replace all references to /pool/genomics/smart1/great/project/wild-cat by $SSD_DIR,
this can be easily done at the shell script level, but not in a configuration file, i.e., for flags/options
execute -o /pool/genomics/smart1/great/project/wild-cat/result.dat
is replaced by
execute -o $SSD_DIR/result.dat

Here is a simple trick to modify a configuration file:

Let's assume that your analysis uses a file wow.conf, where for instance the full path of some files must be listed, like:

Code Block

language	bash
title	wow.conf

# this is the configuration file of the fabulous WOW package
input=/pool/genomics/smart1/great/project/wild-cat/wiskers.dat
output=/pool/genomics/smart1/great/project/wild-cat/tail.dat
paws=4
eyes=2

Replace the wow.conf file by a wow.gen file as follows:

Code Block

language	bash
title	wow.gen

# this is the configuration file of the fabulous WOW package
input=XXXX/wiskers.dat
output=XXXX/wild-cat/tail.dat
paws=4
eyes=2

And create the wow.conf file from the wow.gen at run-time by adding the following in the job script:
Code Block
language bash
sed
"s=XXXX=$SSD_DIR="
wow.gen > wow.conf
As long as XXXX is not used for anything else, this will replace every occurrence of XXXX by the value of the environment variable SSD_DIR.

3 Run the Analysis

...

At the end of the job script, you must add instruction to copy the results of your analysis back to /pool(or /scratch, or /data).

If/when the results are easily identifiable, you can use the commands mv or tar, and find, here are a few examples:

Move the directory where all the results are stored and the log file, delete the rest.

Code Block

language	bash
title	moving identifiable results, delete the rest

# move results and log file

back
cd $SSD_DIR
mv results

 back
cd $SSD_DIR
mv results /pool/genomics/smart1/great/project/wild-cat/.

mv wow.log


mv wow.log /pool/genomics/smart1/great/project/wild-cat/.


# delete the rest

rm


rm -rf *

Move the directory where all the results are stored and the log file, delete the input (conservative approach, in case you missed something).
moving identifiable results, delete known input sets

# move results and log file back
cd $SSD_DIR
mv results /pool/genomics/smart1/great/project/wild-cat/.
mv wow.log /pool/genomics/smart1/great/project/wild-cat/.
#
# delete input set and other stuff
rm -rf input
rm wow.gen wow.conf

Move using the --update flag of mv (see man mv)
moving using --update
# move results using --update
cd $SSD_DIR
mv --update * /pool/genomics/smart1/great/project/wild-cat/.
#
# delete the rest
rm -rf *
Note, you can use mv --update on an explicit list (of files, directories, or file specification), not just * (everything), and
you do not have to remove the rest, but can only remove what you know you can safely remove (conservative approach).

Find newer files and move them: the trick is to create a 'timestamp' file before starting the analysis.
That file can be used later to find any newer file with the --newer= option of tar (see man tar):
Using a timestamp file and tar --newer=

# set the timestamp
date > $SSD_DIR/started.txt
# run the analysis
...
# copy the new files in the subdir data/ to a compressed tar-ball
cd $SSD_DIR
tar --newer=$SSD_DIR/started.txt -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz data/
# now remove it
rm -rf data/
# etc...
# delete everything, unless
rm -rf *

See previous comments and what to tar and what to remove: once you've tar'd new stuff in data/, remove data/, etc..

Using the timestamp file and the find command (see man find):
Using find and a timestamp file

# set the timestamp
date > $SSD_DIR/started.txt
# run the analysis
...
# find the new files in the subdir data/
cd $SSD_DIR
find data/ -newer $SSD_DIR/started.txt -type f > /tmp/list
# do the same on logs/, append to the list
find logs/ -newer $SSD_DIR/started.txt -type f >> /tmp/list
# etc...
# now save what is in the list with one tar
tar --files-from=/tmp/list -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz data/
# now remove data/ and logs/
rm -rf data/ logs/
# etc...
# delete everything, unless
rm -rf *

There are many more ways to accomplish this ....

BTW, the advantage of writing a .tgz file, rather than moving files is two fold, assuming your stuff is compressible:

You write less in the .tgz file, so it should be done faster (reading and compressing should be fast, writing is the slow step)
you need less disk space for your output (since it is compressed).

...

Page tree

Versions Compared

Old Version 5

New Version 6

Key

3 Run the Analysis