...
- You need to replace all references to
/pool/genomics/smart1/great/project/wild-cat
by $SSD_DIR, - this can be easily done at the shell script level, but not in a configuration file, i.e., for flags/options
execute -o /pool/genomics/smart1/great/project/wild-cat/result.dat
is replaced byexecute -o $SSD_DIR/result.dat
- Here is a simple trick to modify a configuration file:
Let's assume that your analysis uses a file
wow.conf
, where for instance the full path of some files must be listed, like:Code Block language bash title wow.conf # this is the configuration file of the fabulous WOW package input=/pool/genomics/smart1/great/project/wild-cat/wiskers.dat output=/pool/genomics/smart1/great/project/wild-cat/tail.dat paws=4 eyes=2
Replace the
wow.conf
file by awow.gen
file as follows:Code Block language bash title wow.gen # this is the configuration file of the fabulous WOW package input=XXXX/wiskers.dat output=XXXX/wild-cat/tail.dat paws=4 eyes=2
And create the
wow.conf
file from thewow.gen
at run-time by adding the following in the job script:Code Block language bash sed
"s=XXXX=$SSD_DIR="
wow.gen > wow.conf
As long as
XXXX
is not used for anything else, this will replace every occurrence ofXXXX
by the value of the environment variableSSD_DIR.
3 Run the Analysis
...
At the end of the job script, you must add instruction to copy the results of your analysis back to
/pool
(or/scratch
, or/data
).
If/when the results are easily identifiable, you can usethe commands mv
or tar,
andfind
, here are a few examples:Move the directory where all the results are stored and the log file, delete the rest.
backCode Block language bash title moving identifiable results, delete the rest # move results and log file
cd
$SSD_DIR
mv
results
back cd $SSD_DIR mv results /pool/genomics/smart1/great/project/wild-cat/.
mv
wow.log
mv wow.log /pool/genomics/smart1/great/project/wild-cat/.
#
# delete the rest
rm
rm -rf *
Move the directory where all the results are stored and the log file, delete the input (conservative approach, in case you missed something).
moving identifiable results, delete known input sets# move results and log file back
cd
$SSD_DIR
mv
results
/pool/genomics/smart1/great/project/wild-cat/
.
mv
wow.log
/pool/genomics/smart1/great/project/wild-cat/
.
#
# delete input set and other stuff
rm
-rf input
rm
wow.gen wow.conf
Move using the
--update
flag ofmv
(seeman mv
)
moving using --update# move results using --update
cd
$SSD_DIR
mv
--update *
/pool/genomics/smart1/great/project/wild-cat/
.
#
# delete the rest
rm
-rf *
Note, you can use
mv --update
on an explicit list (of files, directories, or file specification), not just * (everything), andyou do not have to remove the rest, but can only remove what you know you can safely remove (conservative approach).
Find newer files and move them: the trick is to create a 'timestamp' file before starting the analysis.
That file can be used later to find any newer file with the--newer=
option oftar
(seeman tar
):
Using a timestamp file and tar --newer=# set the timestamp
date
> $SSD_DIR
/started
.txt
# run the analysis
...
# copy the new files in the subdir data/ to a compressed tar-ball
cd
$SSD_DIR
tar
--newer=$SSD_DIR
/started
.txt -cfz
/pool/genomics/smart1/great/project/wild-cat-results
.tgz data/
# now remove it
rm
-rf data/
# etc...
# delete everything, unless
rm
-rf *
See previous comments and what to tar and what to remove: once you've
tar
'd new stuff indata/
, removedata/
, etc..Using the timestamp file and the
find
command (seeman find
):
Using find and a timestamp file# set the timestamp
date
> $SSD_DIR
/started
.txt
# run the analysis
...
# find the new files in the subdir data/
cd
$SSD_DIR
find
data/ -newer $SSD_DIR
/started
.txt -
type
f >
/tmp/list
# do the same on logs/, append to the list
find
logs/ -newer $SSD_DIR
/started
.txt -
type
f >>
/tmp/list
# etc...
# now save what is in the list with one tar
tar
--files-from=
/tmp/list
-cfz
/pool/genomics/smart1/great/project/wild-cat-results
.tgz data/
# now remove data/ and logs/
rm
-rf data/ logs/
# etc...
# delete everything, unless
rm
-rf *
There are many more ways to accomplish this ....
BTW, the advantage of writing a
.tgz
file, rather than moving files is two fold, assuming your stuff is compressible:You write less in the .tgz file, so it should be done faster (reading and compressing should be fast, writing is the slow step)
- you need less disk space for your output (since it is compressed).
...