Page tree
Skip to end of metadata
Go to start of metadata

When submitting jobs to the cluster, you need to tell the job scheduler details about your run (run time, memory needed, CPUs needed etc.) AND the actual commands that you will be running for your analysis.

(lightbulb) You can use the QSub Generator  for both (lightbulb).

To use the QSub Generator:

  1. Open the QSub Generation Utility webpage.
    (warning) This utility works best on Chrome or Firefox. Safari is not recommended.
    (warning) The first time you go to the website you will be prompted about a certificate error. You can safely continue past these warnings.

    (thumbs up) The question mark icons by each entry in the form provide useful information on each item including formatting:
    (thumbs up) The example below shows how to setup a RAxML job that will use a single CPU to run bootstraps and a best tree analysis.

  2. Time and memory

    Time: Enter the maximum time allowed for your job in "CPU time" by entered an exact time in the text box or using the dropdown menu.
    (warning) After this time is reached, your job will be terminated.
    (thumbs up) The dropdown menu will update the time in the text box to show the length of the pre-defined times.
    (thumbs up) CPU time is the amount of time that the processor is using, not the "wall clock" time which is the elapsed time since the job started.
    (thumbs up) For jobs using more than one CPU, CPU time is the product of number of CPUs and the time allocated. 

    Memory: is the maximum memory that your run will use.
    (thumbs up) Some programs will have memory use estimators. In this RAxML example you can use an memory estimator here.
    (warning) The memory requested is per job,  independent of the number of CPUs. It only apply to multi-threaded jobs. If you need assistance with this, please post to the SI HPCC wiki.

  3. Parallel Environment

    In this section you tell how many CPUs your job will use. The design of the program determines if it is limited to one CPU (serial), multiple CPUs spread across servers in the cluster (MPI), or multiple CPUs residing on one server (multi-thread).
    In this example we are only using one CPU (serial).
    (thumbs up) Refer to the program documentation for information about parallel use supported. Program documentation through the module system (described below) may also give more information.

  4. Shell

    For novice bioinformatics users, we recommend you keep this with sh.
    A shell is the program on the computer that interprets your commands. See this wikipedia article for more info.

  5. Modules

    Hydra uses the modules system to load programs that use on the command line. There is a module for each program on the server which includes: where the program is located, other programs it depends on, some help information about starting the program on the cluster. Typically you will select one module for the program you are submitting a job for. The module may refer to a specific version, or in this case "bioinformatics/raxml/latest" will always refer to the newest version of RAxML installed on the cluster.
    (thumbs up) You can start typing a program name to see a list of modules that matches.

  6. Commands

    In this section you put your commands that will be run on the cluster. You start with the name of executable, raxmlHPC-SSE3 in this case, a include all command line options and references to data files.
    (thumbs up) You can find the name of executables by logging into hydra via the terminal and typing module help bioinformatics/raxml/latest (replacing with the module file you will be using).
    (thumbs up) For MPI jobs start with mpirun np -$NSLOTS followed by the executable name and command line options.

  7. Additional options

    In this section you give some more information that will be used to run your job and log the output.
    Job Name: Name the job will be called in the cluster job list (no spaces allowed)
    Log File Name: File where stdout will be sent. Will be filled in automatically with the Job Name.
    Error File Name: File where stderr will be sent if output and error files are not split. Will be filled in automatically with the Job Name.
    Change to CWD: Always check. Will put log files in the directory you start your job from.
    Join stderr & stdout: Recommend to check. Will put all program output into one file (named in "Log File Name").
    Send email notifications: Emails will be send when your job starts and ends.
    Email: Where notifications are sent. This can be any email account.

  8. Check file and download
    Press the "Check if OK" button to confirm that script (in the gray box) was generated correctly.

    Use the "Save it" button to save the .job file.

    (thumbs up) Above the Save it button is the time and total RAM being requested for your job.

  9. Upload to Hydra
    Upload this .job file that is generated to Hydra in the /scratch, /pool or /data folder to be used for your job.

Last update  SGK

  • No labels