1. IDL. GDL & FL

We have 5 interactive licenses and 128 run-time licenses for IDL, and have installed GDL (version 0.9.5) and FL (version 9.9.5).
GDL and FL are source (read no licenses needed) idl-like package.

1.a IDL

The interactive licenses are available for all versions of IDL, from v6.1 to v8.7, and should be used only on the login nodes for pre- or post processing.
to access IDL, simply load the idl module:
% module load idl
To view all the available versions, use:
% ( module -t avail ) | & grep idl
Running IDL the normal way (i.e., interactively) in jobs submitted to the queue will quickly exhaust the available pool of licenses.
Instead you must use the run-time licenses.

Using IDL with Run-Time Licenses

To run IDL with a run-time license you must first compile your IDL code, and all the ancillary procedures and functions it may use and save it in a save set.
The following example shows how to compile a procedure called reduce, stored in the file reduce.pro, to a complete save set that will be called reduce.sav
How to compile reduce.pro and save it as a save set
```
% module load idl/6.1
% idl
IDL> .run reduce
IDL> resolve_all
IDL> save, /routine, file='reduce.sav'
```

After creating a save set, changes to any segment of the code will not be reflected in the .sav file; you must recompile the code each time you modify it.
To run IDL in run-time mode, you load the idl/rt module and use the -rt=XXX flag when invoking IDL.
To let the GE know that you will pull an IDL run-time license, use the -l idlrt=1 flag when qsub'ing.
The GE will keep track of the number of licenses used and will limit accordingly the number of concurrent jobs using them to the number of available licenses.

Here is an example of a job file reduce.job that will run the reduce procedure:

IDL run-time example job file

# /bin/csh
#
#$ -cwd -j y -o reduce.log -N reduce
#$ -l idlrt
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
#
module load idl/rt
#
idl -rt=reduce.sav
#
echo = `date` job $JOB_NAME done

You then run that job with:
% qsub reduce.job

Notes

We have added run-time licenses for idl 8.6 and higher: you can now use the modules idl/8.6 or idl/8.7 instead of idl/rt.
The name of the save file must match the name of the top procedure to run, and
there is no way to pass arguments to that top level procedure, when started in with -rt=name.sav, i.e.:
If you want to run the procedure make_my_model, you should write make_my_model.pro and compile it to a make_my_model.sav file.
IDL procedures can read parameters from a file, or from standard input (stdin), for example
Example of reading from stdin (standart input)
```
idl -rt=make_my_model.sav<<EOF
5943.
124.
sun
EOF
```
the corresponding procedure would look like this
Corresponding procedure make_my_model.pro
```
procedure make_my_model, temperature, density, name
;;
;; this procedure computes a model using a given temperature and density
;;   and saves it to a file whose name is derived from given string
;;
if n_params() eq 0 then
  ;;
  ;; no parameters passed, so we need to read the parameters from stdin
  ;; let's initialize them to set the variable type (double and string)
  ;;
  temperature = 0.D0
  density = 0.0D0
  name = ''
  ;;
  read, temperature
  read, density
  read, name
endif
;;
fullName = name + '.dat'
print, 'running a model for T=', temperature,', rho=', density
print, 'saving it to "'+fullName+'"'
;;
[the code goes here]
;;
end
```
- If the reference to the .sav file is to a different (sub)directory, IDL will execute a cd to that directory, namely
  idl -rt=src/make_my_model
  is in fact equivalent to
  cd src
  idl -rt=make_my_model
- Some IDL procedures will use more than one thread (more than one CPU) if some operations can be parallelized.
  In fact IDL queries how many CPUs there are on the current machine to decide on the maximum number of threads it may start, assuming effectively that you are the sole user of that computer.
  This is not appropriate for a shared resource and it should not be used.
  To avoid this you must add the following instruction:
  CPU, TPOOL_NTHREADS = 1
  to limit IDL to using only one thread (one CPU).
  
  Alternatively, you can request several slots (CPUs, threads) on a single compute node when submitting a job (qsub -pe mthread N, where N is a number between 2 and 64),
  and tell IDL to use that same number of threads with the CPU command.
  Of course, if your IDL code is not using N threads all the time, you will be grabbing more resources than you will be using, something that should be avoided.
- You can check if IDL was started in run-time mode with
  IF lmgr(/runtime) THEN message, /info, 'runtime mode is on'
Some IDL instructions are not allowed in run-time mode to prevent emulating interactive mode, consult the manual.
Currently our run-time licenses allow us to run either IDL version 6.1 (and earlier) or IDL 8.7 (and 8.6).
IDL changed their license manager at version 8.5, hence the 'gap'.
You can check how many run-time licenses the GE thinks are still available with
% qhost -F idlrt -h global
You can check how many licenses (8.5 or earlier) are used by loading one of the idl module and issuing either command (to check interactive use) :
% module load idl
% $IDL_DIR/bin/lmstat -c /share/apps/idl/license/license.dat -f idl
or (to check run-time license use)
% module load idl
% $IDL_DIR/bin/lmstat -c /share/apps/idl/license/license.dat -f idl_rt
Note that the number of licenses returned by lmstat has to be divided by 6 (for historical reasons, each instance of IDL grabs 6 of these license tokens.)
For 8.6 or 8.7, the new license manager returns very little info (known problem), you can query the license info with

/share/apps/idl/flexnetls_2017.08.0/enterprise/flexnetlsadmin.sh -licenses

or

/share/apps/idl/flexnetls_2017.08.0/enterprise/flexnetlsadmin.sh -licenses -verbose

1.b GDL & FL

GDL & FL are an open source packages compatible with IDL(
- GDL is compatible with version 7.1, see http://sourceforge.net/projects/gnudatalanguage
- FL is compatible with version 8, see https://www.flxpert.hu/fl
These are free software:
- you get what you paid for, but
- there is no licensing nor run-time limitations.
The version 0.95 of GDL and version 0.79.47 of FL are available on Hydra, and is accessible by loading the respective modules:
% module load tools/gdl
or

% module load tools/fl

The module will set variable GDL_STARTUP to either ~/.gdl.startup.0.9.5 or ~/.gdl.startup, if either file exists (checked in that order).
Any GDL commands in the startup file will be executed every time GDL is started as if typed at the prompt.
Like IDL, some GDL procedures will use more than one thread (more than one CPU) if some operations can be parallelized.
GDL queries how many CPUs there are on the current machine to decide on the maximum number of threads it may start, assuming effectively that you are the sole user of that computer.
This is not appropriate for a shared resource and it should not be used. To avoid this you must add the following instruction:
CPU, TPOOL_NTHREADS = 1
to limit GDL to using only one thread (one CPU).

Alternatively, you can request several slots, as described for IDL above, with the same caveats.

2. MATLAB

Full fledged MATLAB is not available on Hydra
- we would need to purchase licenses for Hydra,
- SAO users can use the MATLAB compiler to produce run-time version of their MATLAB code.

The MATLAB run-time environment is available on Hydra

to access it, load the right module:

Module	Description
`matlab/R2014a`	2014 first release
`matlab/R2017b0`	2017 second release (SAO/CF equivalent)
`matlab/R2017b`	2017 second release, with (latest) update (# 9)
`matlab/R2019a`	2019 first release
`matlab/R2019b`	2019 second release
`matlab/rt → R2019b`	default run-time is set to use R2019b

Note that:

you must compile your MATLAB application to run it on Hydra,
SAO has a single (concurrent) seat license for the MATLAB compiler, available on all CF-managed machines.

3. Python

The default python with CentOS 7.x is version 2.7.5; so unless you load a specific module, you will run that version of python.

Additional versions of python are available as follows:

Module	Version	Comment
`python2`	Python 2.7.15	BCM version
`python36`	Python 3.6.7	BCM version
`tools/python`	Python 3.7.3	Fully supported
`tools/python/2.7`	Python 2.7.16 :: Anaconda, Inc.	Fully supported
`tools/python/3.7`	Python 3.7.3	Fully supported
`intel/python`	Python 3.6.9 :: Intel Corporation	Intel's version
`intel/python/27`	Python 2.7.16 :: Intel Corporation	Intel's version
`intel/python/36`	Python 3.6.9 :: Intel Corporation	Intel's version

If you are looking for the versions with all the usual packages, use the tools/python ones.

NUMPY Multi-Threading

By default, NUMPY is build to use multi-threading, namely some numerical operations will use all the available CPUs on the node it is running on.
- This is NOT the way to use a shared resource, like Hydra,
- The symptom is that your job is oversubscribed.

The solution is to tell NUMPY how many threads to use, using the respective module:

serial case

load tools/single-thread-numpy

or

multi-thread case

load tools/mthread-numpy

use module show <module-name> to see what is done.

Example:

demo-mthread-numpy.job

#
# this example uses 4 threads
#$ -pe mthread 4
#$ -cwd -j y -o demo-mthread-numpy.log -N demo-mthread-numpy
#
echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with id=$JOB_ID
echo NSLOTS = $NSLOTS
#
module load tools/mthread-numpy
python my-big-data-cruncher.py
#
echo = `date` $JOB_NAME done.

4. Java

Java version 1.8 is available on Hydra by loading the appropriate module:
% module load java
or
% module load java/1.8
Java does not play nice w/ GE:
- Java, by default, wants to starts as many threads and grab as much memory as possible.
- By not specifying some memory related parameters, java fails in every submitted job, with the following message:
```
Error occurred during initialization of VM
Could not reserve enough space for object heap
```
- You should always start java with the following options:
```
java -d64 -server -XX:MaxHeapSize=1g
```
  where the value "1g" should be adjusted to the memory needed by the application and for the job to fit within the queue and the requested resources memory constraints.
  The total amount of memory used by java is not just the maximum heap size.
- If you need more memory, be sure to adjust the memory resource request accordingly (-l memres=X,h_data=X,h_vmem=X), see the section about memory reservation in the Available Queues page.
The complete documentation for all of java options (v1.8) is posted here.

5. R

R version 3.6.1 (built with gcc 4.9.2) is available by loading the corresponding module:
% module load bioinformatics/R
Do not use any longer tools/R, these are old (Hydra-4) modules, that should soon go away.
~~R version 3.4.1 (built with gcc 4.9.2) is available by loading the corresponding module:~~
~~% module load tools/R~~
Installing Packages:
Users can install packages their own packages in a user-specific library. When the install.packages() command is used, there will be a notification that the system-wide library is not writable and you will prompted to create a personal library in your home folder. All future packages that you install will be installed into your personal library. You will then be prompted to choose a repository.
Packages only need to be installed one time to be accessible to all of your R jobs on all nodes on the cluster. Because compute nodes don't have internet access, you will need to run the install.packages() command interactively from the login node.
$NSLOTS in R scripts
It is best practice to use the environmental variable NSLOTS in your scripts to specify the number of cores to use for R commands that supports multiple cores. By using NSLOTS rather than hardcoding the number of cores in your R script, your job will utilize the requested number of slots, even if you change your qsub submission parameters.
Use the R Base function Sys.getenv() to access the value of $NSLOTS from within your script.
```
## Script using Sys.gentenv() to read the value of $NSLOTS

numcores <- Sys.getenv("NSLOTS")
```
Proper parallelization of makeCluster():
R packages that incorprate the makeCluster() function must specify type="FORK" as an argument. Without this, Hydra's scripts that kill zombie jobs will terminate the R processes created by makeCluster().
```
cl <- makeCluster(no_cores, type="FORK")
```

6. Anaconda and miniconda

Users may find it convenient to use anaconda or miniconda to install and manage software in their user space. It is acceptable to use these systems, but like compiling and running any software on Hydra, the user should be familiar with how the software functions and uses resources (memory, CPUs, disk) so as not to overutilize cluster resources.
Miniconda, rather than the full anaconda, is preferred on Hydra because it initially installs minimal packages reducing disk usage; required dependencies will be installed by conda.
To install miniconda, download a current 64-bit miniconda installer from here. Then follow these instructions to install in your user space. The default location to unpack the software is your home directory. This works well because /home is not scrubbed and the disk space needed by miniconda typically is within the user quotas for /home.
After installation, you can use conda install to install software. See these instructions for full conda documentation: https://conda.io/projects/conda/en/latest/glossary.html#conda-repository-glossary
To use software installed via conda in submitted job, you can create a personal module file to add your miniconda bin directory to your PATH. See here for instructions.

7. Julia

Julia version 1.0.3 (julialang.org) is availabe by loading the corresponding module:
% module load tools/julia
Check
% module help tools/julia
% man julia
julialang.org and juliaplots.org

Last update 19 Nov 2019 SGK/MPK

Page tree

Packages/Tools