Welcome to Hydra-5, the upgraded Hydra cluster.
Location of stuff and modules
We now use BCM instead of, no longer Rocks, hence the locations of things have changed: no more /opt, but lots of stuff under /cm.
If you load modules and use environment variables, everything should work. If you hardwired a system path, you need to do the right thing (use modules and environment variables).
Because we upgraded the OS version, the versions of a lot of the standard tools have also changed too, hence things might be different or new options are now available.
Submitting jobs
You still use qsub and alike, but it is now accessed via the uge module. That module is loaded for you, so unless you unload it, or do something weird, there is no need to do anything else.
The output of some GE’s commands will look different, and a few take different options.
The same queues are available, except that the interactive and I/O nodes are now different nodes (our newest nodes) and the GPU and SSD queues are not yet available. These two resources have yet to be configured.
Also, you will notice that the compute node naming has slightly changed: the compute nodes are all called compute-NN-MM, with both NN and MM are always a 2-digit string and the value of NN is shorthand for the node model (all compute-64-MM are all Dell R640). The fully qualified name is no longer compute-NN-MM.local, but compute-NN-MM.cm.cluster.
Memory Reservation
While this may be confusing and/or annoying, we have decided to change how the memory reservation is computed: it is no longer a per slot (thread, CPU), but a per job value.
All your jobs MUST be adjusted to use the new value, which is the old value multiplied by the number of slots (threads, CPUs).
QSubGen has yet to be adjusted to reflect this change (stay tuned)!
In other words a: -pe mthread 10 -l mres=20G,h_data=20G,h_vmem=20G,himem must be replaced by: -pe mthread 10 -l mres=200G,h_data=200G,h_vmem=200G,himem to reserve 200GB for the job. |
But because we had a lot to copy (120TB), we started the copying on Aug 22 at 22:54, and then rsync’d the content. This means that any file you had under /scratch when we started copying and have deleted afterwards might have been copied. So check your stuff if you were actively using /scratch after Aug 22 at 22:54.
Also /scratch is a GPFS file system, no longer an NFS one. It should be faster, but a few commands do not work the same way, like df and quota.
For the GPFS, we recommend that you use: df -h --output=source,fstype,size,used,avail,pcent,file not just df -h module load tools/local quota+ |
If you want things to be the same, load tools/local-bc (bc stands for backward compatible).
If you want to figure out what is what, try: module help tools/local module help tools/local+ module help tools/local-bc The last one will show you what names have changed. |