...
At the end of the job script, you must add instruction to copy the results of your analysis back to
/pool
(or/scratch
, or/data
).If/when the results are easily identifiable, you can use
the commands mv
or tar,
andfind
, here are a few examples:Move the directory where all the results are stored and the log file, delete the rest.
Code Block language bash title moving identifiable results, delete the rest # move results and log file back cd $SSD_DIR mv results /pool/genomics/smart1/great/project/wild-cat/. mv wow.log /pool/genomics/smart1/great/project/wild-cat/. # # delete the rest rm -rf *
Move the directory where all the results are stored and the log file, delete the input (conservative approach, in case you missed something).
Code Block language bash title moving identifiable results, delete known input sets Code Block language bash # move results and log file back cd $SSD_DIR mv results /pool/genomics/smart1/great/project/wild-cat/. mv wow.log /pool/genomics/smart1/great/project/wild-cat/. # # delete input set and other stuff rm -rf input rm wow.gen wow.conf
ove Move using the
--update
flag ofmv
(seeman mv
)Code Block language bash title moving using --update Code Block language bash # move results using --update cd $SSD_DIR mv --update * /pool/genomics/smart1/great/project/wild-cat/. # # delete the rest rm -rf *
Note, you can use
mv --update
on an explicit list (of files, directories, or file specification), not just * (everything), and you do not have to remove the rest, but can only remove what you know you can safely remove (conservative approach).
ind Find newer files and move them: the trick is to create a 'timestamp' file before starting the analysis.
That file can be used later to find any newer file with the--newer=
option oftar
(seeman tar
):Code Block language bash title Using a timestamp file and tar --newer= Code Block language bash # set the timestamp date > $SSD_DIR/started.txt # run the analysis ... # copy the new files in the subdir data/ to a compressed tar-ball cd $SSD_DIR tar --newer=$SSD_DIR/started.txt -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz data/ # now remove it rm -rf data/ # etc... # delete everything, unless rm -rf *
See previous comments and what to tar and what to remove: once you've
tar
'd new stuff indata/
, removedata/
, etc.Using the timestamp file and the
find
command (seeman find
):Code Block language bash title Using find and a timestamp file Code Block language bash # set the timestamp date > $SSD_DIR/started.txt # run the analysis ... # find the new files in the subdir data/ cd $SSD_DIR find data/ -newer $SSD_DIR/started.txt -type f > /tmp/list # do the same on logs/, append to the list find logs/ -newer $SSD_DIR/started.txt -type f >> /tmp/list # etc... # now save what is in the list with one tar tar --files-from=/tmp/list -cfz /pool/genomics/smart1/great/project/wild-cat-results.tgz data/ # now remove data/ and logs/ rm -rf data/ logs/ # etc... # delete everything, unless rm -rf *
BTW, the advantage of writing a.tgz
file, rather than moving files is two fold, assuming your stuff is compressible: There are many more ways to accomplish this ....You write less in the .tgz file, so it should be done faster (reading and compressing should be fast, writing is the slow step)
- you need less disk space for your output (since it is compressed).
...