You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 52 Next »

  1. IDL, GDL & FL
  2. MATLAB
  3. PYTHON
    1. NUMPY multi-threading
  4. JAVA
  5. Packages/Tools
  6. Anaconda and miniconda
  7. Packages/Tools

1. IDL. GDL & FL

  • We have 5 interactive licenses and 128 run-time licenses for IDL, and have installed GDL (version 0.9.5) and FL (version 9.9.5).
  • GDL and FL are source (read no licenses needed) idl-like package.

1.a IDL

  • The interactive licenses are available for all versions of IDL, from v6.1 to v8.7, and should be used only on the login nodes for pre- or post processing.
  • to access IDL, simply load the idl module:
    % module load idl
  • To view all the available versions, use:
    % ( module -t avail ) | & grep idl
  • Running IDL the normal way (i.e., interactively) in jobs submitted to the queue will quickly exhaust the available pool of licenses.
  • Instead you must use the run-time licenses.

Using IDL with Run-Time Licenses

  • To run IDL with a run-time license you must first compile your IDL code, and all the ancillary procedures and functions it may use and save it in a save set.

  • The following example shows how to compile a procedure called reduce, stored in the file reduce.pro, to a complete save set that will be called reduce.sav

    How to compile reduce.pro and save it as a save set
    % module load idl/6.1
    % idl
    IDL> .run reduce
    IDL> resolve_all
    IDL> save, /routine, file='reduce.sav'
  • After creating a save set, changes to any segment of the code will not be reflected in the .sav file; you must recompile the code each time you modify it.
  • To run IDL in run-time mode, you load the idl/rt module and use the -rt=XXX flag when invoking IDL.
  • To let the GE know that you will pull an IDL run-time license, use the -l idlrt=1 flag when qsub'ing.
    The GE will keep track of the number of licenses used and will limit accordingly the number of concurrent jobs using them to the number of available licenses.
  • Here is an example of a job file reduce.job that will run the reduce procedure:

    IDL run-time example job file
    # /bin/csh
    #
    #$ -cwd -j y -o reduce.log -N reduce
    #$ -l idlrt
    #
    echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
    #
    module load idl/rt
    #
    idl -rt=reduce.sav
    #
    echo = `date` job $JOB_NAME done
  • You then run that job with:
    % qsub reduce.job

Notes

  • We have added run-time licenses for idl 8.6 and higher: you can now use the modules idl/8.6 or idl/8.7 instead of idl/rt.
  • The name of the save file must match the name of the top procedure to run, and
  • there is no way to pass arguments to that top level procedure, when started in with -rt=name.sav, i.e.:
    If you want to run the procedure make_my_model, you should write make_my_model.pro and compile it to a make_my_model.sav file.
  • IDL procedures can read parameters from a file, or from standard input (stdin), for example
  • Example of reading from stdin (standart input)
    idl -rt=make_my_model.sav<<EOF
    5943.
    124.
    sun
    EOF

    the corresponding procedure would look like this

    Corresponding procedure make_my_model.pro
    procedure make_my_model, temperature, density, name
    ;;
    ;; this procedure computes a model using a given temperature and density
    ;;   and saves it to a file whose name is derived from given string
    ;;
    if n_params() eq 0 then
      ;;
      ;; no parameters passed, so we need to read the parameters from stdin
      ;; let's initialize them to set the variable type (double and string)
      ;;
      temperature = 0.D0
      density = 0.0D0
      name = ''
      ;;
      read, temperature
      read, density
      read, name
    endif
    ;;
    fullName = name + '.dat'
    print, 'running a model for T=', temperature,', rho=', density
    print, 'saving it to "'+fullName+'"'
    ;;
    [the code goes here]
    ;;
    end
    • If the reference to the .sav file is to a different (sub)directory, IDL will execute a cd to that directory, namely
        idl -rt=src/make_my_model
      is in fact equivalent to
        cd src
        idl -rt=make_my_model

    • Some IDL procedures will use more than one thread (more than one CPU) if some operations can be parallelized.
      In fact IDL queries how many CPUs there are on the current machine to decide on the maximum number of threads it may start, assuming effectively that you are the sole user of that computer.
      (warning) This is not appropriate for a shared resource and it should not be used.
      To avoid this you must add the following instruction:
         CPU, TPOOL_NTHREADS = 1
      to limit IDL to using only one thread (one CPU). 

      Alternatively, you can request several slots (CPUs, threads) on a single compute node when submitting a job (qsub -pe mthread N, where N is a number between 2 and 64),
      and tell IDL to use that same number of threads with the CPU command.
      Of course, if your IDL code is not using N threads all the time, you will be grabbing more resources than you will be using, something that should be avoided.

    • You can check if IDL was started in run-time mode with
        IF lmgr(/runtime) THEN message, /info, 'runtime mode is on'

  • Some IDL instructions are not allowed in run-time mode to prevent emulating interactive mode, consult the manual.

  • Currently our run-time licenses allow us to run either IDL version 6.1 (and earlier) or IDL 8.7 (and 8.6).

    IDL changed their license manager at version 8.5, hence the 'gap'.

  • You can check how many run-time licenses the GE thinks are still available with
    %  qhost -F idlrt -h global
  • You can check how many licenses (8.5 or earlier) are used by loading one of the idl module and issuing either command (to check interactive use) :
    % module load idl
    % $IDL_DIR/bin/lmstat -c /share/apps/idl/license/license.dat -f idl
    or (to check run-time license use)
    % module load idl
    % $IDL_DIR/bin/lmstat -c /share/apps/idl/license/license.dat -f idl_rt
    Note that the number of licenses returned by lmstat has to be divided by 6 (for historical reasons, each instance of IDL grabs 6 of these license tokens.)
  • For 8.6 or 8.7, the new license manager returns very little info (known problem), you can query the license info with

/share/apps/idl/flexnetls_2017.08.0/enterprise/flexnetlsadmin.sh -licenses

or

/share/apps/idl/flexnetls_2017.08.0/enterprise/flexnetlsadmin.sh -licenses -verbose

1.b GDL & FL

  • GDL & FL are an open source packages compatible with IDL(
  • These are free software:
    • you get what you paid for, but
    • there is no licensing nor run-time limitations.

  • The version 0.95 of GDL and version 0.79.47 of FL are available on Hydra, and is accessible by loading the respective modules:
    % module load tools/gdl
    or

% module load tools/fl

  • The module will set variable GDL_STARTUP to either ~/.gdl.startup.0.9.5 or ~/.gdl.startup, if either file exists (checked in that order).
    Any GDL commands in the startup file will be executed every time GDL is started as if typed at the prompt.
  • Like IDL, some GDL procedures will use more than one thread (more than one CPU) if some operations can be parallelized.
    GDL queries how many CPUs there are on the current machine to decide on the maximum number of threads it may start, assuming effectively that you are the sole user of that computer.
    (warning) This is not appropriate for a shared resource and it should not be used. To avoid this you must add the following instruction:
       CPU, TPOOL_NTHREADS = 1
    to limit GDL to using only one thread (one CPU).

    Alternatively, you can request several slots, as described for IDL above, with the same caveats.
     

2. MATLAB

  • Full fledged MATLAB is not available on Hydra
    • we would need to purchase licenses for Hydra,
    • SAO users can use the MATLAB compiler to produce run-time version of their MATLAB code.

  • The MATLAB run-time environment is available on Hydra
    • to access it, load the right module:

      ModuleDescription
      matlab/R2014a2014 first release
      matlab/R2017b02017 second release (SAO/CF equivalent)
      matlab/R2017b2017 second release, with (latest) update (# 9)
      matlab/R2019a2019 first release
      matlab/R2019b2019 second release
      matlab/rt → R2019bdefault run-time is set to use R2019b

      Note that:

      • you must compile your MATLAB application to run it on Hydra,
      • SAO has a single (concurrent) seat license for the MATLAB compiler, available on all CF-managed machines.

3. Python

  • The default python with CentOS 7.x  is version 2.7.5; so unless you load a specific module, you will run that version of python.
  •  Additional versions of python are available as follows:

    ModuleVersionComment
    python2Python 2.7.15BCM version
    python36Python 3.6.7BCM version
    tools/pythonPython 3.7.3Fully supported
    tools/python/2.7Python 2.7.16 :: Anaconda, Inc.Fully supported
    tools/python/3.7Python 3.7.3Fully supported
    intel/pythonPython 3.6.9 :: Intel CorporationIntel's version
    intel/python/27Python 2.7.16 :: Intel CorporationIntel's version
    intel/python/36Python 3.6.9 :: Intel CorporationIntel's version

    If you are looking for the versions with all the usual packages, use the tools/python ones.

NUMPY Multi-Threading

  • By default, NUMPY is build to use multi-threading, namely some numerical operations will use all the available CPUs on the node it is running on.
    • (warning) This is NOT the way to use a shared resource, like Hydra,
    • The symptom is that your job is oversubscribed.
  • The solution is to tell NUMPY how many threads to use, using the respective module:

    serial case
    load tools/single-thread-numpy

    or

    multi-thread case
    load tools/mthread-numpy

    use module show <module-name> to see what is done.

    1. Example:

      demo-mthread-numpy.job
      #
      # this example uses 4 threads
      #$ -pe mthread 4
      #$ -cwd -j y -o demo-mthread-numpy.log -N demo-mthread-numpy
      #
      echo + `date` $JOB_NAME started on $HOSTNAME in $QUEUE with id=$JOB_ID
      echo NSLOTS = $NSLOTS
      #
      module load tools/mthread-numpy
      python my-big-data-cruncher.py
      #
      echo = `date` $JOB_NAME done.

4. Java

  • Java version 1.8 is available on Hydra by loading the appropriate module:
       % module load java

    or

       % module load java/1.8


  • (warning) Java does not play nice w/ GE:

    • Java, by default, wants to starts as many threads and grab as much memory as possible.
    • By not specifying some memory related parameters, java fails in every submitted job, with the following message:

      Error occurred during initialization of VM
      Could not reserve enough space for object heap
    • (lightbulb) You should always start java with the following options:

      java -d64 -server -XX:MaxHeapSize=1g

      where the value "1g" should be adjusted to the memory needed by the application and for the job to fit within the queue and the requested resources memory constraints.

      The total amount of memory used by java is not just the maximum heap size.

    • If you need more memory, be sure to adjust the memory resource request accordingly (-l memres=X,h_data=X,h_vmem=X), see the section about memory reservation in the Available Queues page.

  • (grey lightbulb) The complete documentation for all of java options (v1.8) is posted here.

5. R

  • R version 3.6.1 (built with gcc 4.9.2) is available by loading the corresponding module:
    % module load bioinformatics/R
  • Do not use any longer tools/R, these are old (Hydra-4) modules, that should soon go away.
  • R version 3.4.1 (built with gcc 4.9.2) is available by loading the corresponding module:
    % module load tools/R

  • Installing Packages: 

    Users can install packages their own packages in a user-specific library. When the install.packages() command is used, there will be a notification that the system-wide library is not writable and you will prompted to create a personal library in your home folder. All future packages that you install will be installed into your personal library. You will then be prompted to choose a repository.

    Packages only need to be installed one time to be accessible to all of your R jobs on all nodes on the cluster. Because compute nodes don't have internet access, you will need to run the install.packages() command interactively from the login node.

  • $NSLOTS in R scripts
    It is best practice to use the environmental variable NSLOTS in your scripts to specify the number of cores to use for R commands that supports multiple cores. By using NSLOTS rather than hardcoding the number of cores in your R script, your job will utilize the requested number of slots, even if you change your qsub submission parameters.

    (lightbulb) Use the R Base function Sys.getenv() to access the value of $NSLOTS from within your script.

    ## Script using Sys.gentenv() to read the value of $NSLOTS
    
    numcores <- Sys.getenv("NSLOTS")
  • Proper parallelization of makeCluster():
    R packages that incorprate the makeCluster()  function must specify type="FORK" as an argument. Without this, Hydra's scripts that kill zombie jobs will terminate the R processes created by makeCluster()

    cl <- makeCluster(no_cores, type="FORK")

6. Anaconda and miniconda

  • Users may find it convenient to use anaconda or miniconda to install and manage software in their user space. It is acceptable to use these systems, but like compiling and running any software on Hydra, the user should be familiar with how the software functions and uses resources (memory, CPUs, disk) so as not to overutilize cluster resources.
  • Miniconda, rather than the full anaconda, is preferred on Hydra because it initially installs minimal packages reducing disk usage; required dependencies will be installed by conda.
  • To install miniconda, download a current 64-bit miniconda installer from here. Then follow these instructions to install in your user space. The default location to unpack the software is your home directory. This works well because /home is not scrubbed and the disk space needed by miniconda typically is within the  user quotas for /home.
  • After installation, you can use conda install to install software. See these instructions for full conda documentation: https://conda.io/projects/conda/en/latest/glossary.html#conda-repository-glossary
  • To use software installed via conda in submitted job, you can create a personal module file to add your miniconda bin directory to your PATH. See here for instructions.

7. Julia


Last update     SGK/MPK



  • No labels