1. Packages
    1. conda command
    2. Help with conda
    3. Listing installed packages
    4. How to find conda package
    5. Installing a package
    6. Adding bioconda and conda-forge to your channels
    7. conda install tricks
  2. Using conda in submitted jobs
    1. Create a module file
    2. Job file using your module file
    3. If you use the pre-installed Anaconda
  3. Environments
    1. Creating an environment
    2. Adding packages to the environment
    3. Using an environment in your job file
    4. Removing an environment
    5. Useful environments: python2 and R
    6. When there isn't a conda package for a pipeline
  4. Troubleshooting
  5. Advanced
  6. Appendix
    1. Definitions
    2. Channels
  7. Additional commands
    1. Remove packages
    2. Update packages

1. Packages

  • A conda package is the precompiled program and a link to all of the required packages that will also be downloaded and needed.
  • A whole pipeline can be defined in a package: all the scripts and dependent software (e.g. qiime2 or phyluce)
  • Packages are created by people in the community and shared publicly. Sometimes they’re created by the software developer, but they’re often by users of the software.
  • You can create your own packages which a great method for reproducible science and collaboration. We’re not covering that, but we can direct you to resources.

conda command

We'll be using the conda command for managing packages and controlling conda

Help with conda

Listing installed packages

Some packages come pre-installed with either Anaconda or Miniconda (a lot more if you're using Anaconda), for Minconda:

(base)$ conda list
# packages in environment at /home/user/miniconda3:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
asn1crypto                1.2.0                    py37_0
ca-certificates           2019.10.16                    0
certifi                   2019.9.11                py37_0
cffi                      1.13.0           py37h2e261b9_0
chardet                   3.0.4                 py37_1003
conda                     4.7.12                   py37_0
conda-package-handling    1.6.0            py37h7b6447c_0
cryptography              2.8              py37h1ba5d50_0
...

How to find conda packages

You can search the public package repositories at https://anaconda.org/ (star) What we suggest using

  • Use the anaconda.org search feature.
  • Copy the install command from the package's page

Or do a web search: "conda r" or "bioconda mafft"

Or use conda search "blast" (finds packages with blast anywhere in their name in your current channels)


  • Search for admixtools on anaconda.org
    • Only available on bioconda
    • conda install -c bioconda admixtools
  • Search for mafft on anaconda.org
    • Available on multiple channels
    • bioconda, conda-forge, anaconda, r are reliable channels. You can try other popular ones.

Installing a package

conda install ... installs a packages and its dependencies

  • It doesn’t matter what your current directory is, it will install in your "conda" directory!
  • (warning) if you use the pre-installed version of Anaconda, you won't be able to install (or update) anything in the base environment,
    • (lightbulb) you will need to create and active your own conda environment(s) first (see below) and you will see that name in lieu of (base) in the following examples.
(base)$ conda install -c bioconda mafft

The following packages will be downloaded:
...
The following NEW packages will be INSTALLED:
...
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Test it:

(base)$ mafft --help
------------------------------------------------------------------------------
  MAFFT v7.471 (2020/Jul/3)
  https://mafft.cbrc.jp/alignment/software/
  MBE 30:772-780 (2013), NAR 30:3059-3066 (2002)
------------------------------------------------------------------------------

Adding bioconda and conda-forge to your channels

(base)$ conda config --add channels defaults
(base)$ conda config --add channels bioconda
(base)$ conda config --add channels conda-forge

conda config --get channels shows your current channels

Which channel is the highest and which is the lowest?

$ conda config --get channels
--add channels 'defaults'   # lowest priority
--add channels 'bioconda'
--add channels 'conda-forge'   # highest priority

conda-forge is the highest, defaults is the lowest.

Note: the more channels you have, the increased time to "solve" installation.

(base)$ conda install iqtree

...
The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py37hc8dfbb8_0         151 KB  conda-forge
    conda-4.8.3                |   py37hc8dfbb8_1         3.0 MB  conda-forge
    iqtree-2.0.3               |       h176a8bc_0         2.8 MB  bioconda
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.7             |          1_cp37m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         8.2 MB
...

conda install tricks

  • Installing multiple items
  • Specifying program versions
    • Show available versions with: conda search python
    • conda install python=3
    • =3 matches the most recent version starting with 3
    • ==3.7.5 matches exactly the version specified
    • ">3.7" matches most recent version greather than 3.7 (in quotes because of the > which will interfere with the shell's redirection features)
    • "<3.8" matches most recent version less than 3.8 (in quotes because of the < which will interfere with the shell's redirection features)
    • See the conda documentation for more information

2. Using conda in submitted jobs

There are a couple steps needed to utilize conda in your submitted jobs.

(warning) By default the installed programs won't be found.

If you want to use your Miniconda installation:

Create a module file

There is Hydra documentation about creating module files here.

We'll set one up for your miniconda install. Module files are text files with instructions on how the current environment should be modified.

We suggest creating a directory in your home folder called modulefiles and creating a module called miniconda in that directory.

(base)$ mkdir ~/modulefiles
(base)$ cd ~/modulefiles
(base)$ nano miniconda

and then enter into the new text file:

#%Module1.0
prepend-path PATH /home/your_username/miniconda3/bin

(warning) Make sure to change your_username to your Hydra username!

Job file using your module file

Use the module load command followed by the path and name of your module file to load your module: module load ~/modulefiles/miniconda

Example job file using:

# /bin/sh
# ----------------Parameters---------------------- #
#$ -S /bin/sh
#$ -q sThC.q
#$ -l mres=2G,h_data=2G,h_vmem=2G
#$ -cwd
#$ -j y
#$ -N minicondatest
#$ -o minicondatest.out
#
# ----------------Modules------------------------- #
module load ~/modulefiles/miniconda
#
# ----------------Your Commands------------------- #
#
echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME
echo + NSLOTS = $NSLOTS
#
echo "Conda location:"
which conda

echo
echo "Installed packages:"
conda list

#
echo = `date` job $JOB_NAME done

Excerpt of minicondatest.out:

...
Conda location:
/home/user/miniconda3/bin/conda

Installed packages:
# packages in environment at /home/user/miniconda3:
#
# Name                    Version                   Build  Channel
...

If you use the pre-installed Anaconda or Miniconda

  • No need to create a module
    • but you need to load the tools/conda module and run start-conda

Simply replace, in the examples above, the line

module load ~/modulefiles/miniconda

by the following two lines

module load tools/conda

start-conda

Even if you have these two lines in your ~/.bashrc  file. (lightbulb) You can save typing the first line in your job file if you insert it in a file called ~/.profile 

  • To enable conda  for your jobs that use the C shell (-S /bin/csh  or no -S  option passed):
    • either  add these two lines in your ~/.cshrc, or
    • add these two lines in you job file, or
    • the first one in your ~/.cshrc and the second in the job file.

(warning) If you need access to packages installed in an environment (see below) you will need to add the corresponding conda activate  command, like in

conda activate iqtree-v1

3. Wait, there's one more concept to keep things clean... environments

So far all installs we've done have done into one set of packages called (base).

What if programs have conflicting dependencies or you need different versions of the same program?

A conda Environment is a compartmentalized set of packages, and if you use the pre-installed Anaconda, it allows you to modify it (since you can't modify the pre-installed Anaconda base environment)

Also, the more packages you have, the longer it will take for conda on the "Solving Environment" step.


What if you wanted to have both iqtree1 and iqtree2 installed? Above we showed how to installed iqtree.

It installed the latest version: 2.0.3. If you also need iqtree 1.x installed, the conda command: conda install iqtree=1  will downgrade your current iqtree from 2.x to 1.x, it won't allow you to have more than one version installed. You can have environments to have access to the two versions. (Using iqtree=1 will install the latest version of iqtree that starts with a 1).

Creating an environment

(base)$ conda create -n iqtree-v1
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
  environment location: /home/user/miniconda3/envs/iqtree-v1
Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate iqtree-v1
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Adding packages to the environment

Although the output says to use conda activate..., we recommend using source activate... because it is compatible with job files submitted through qsub

(base)$ source activate iqtree-v1
(iqtree-v1)$ conda install iqtree=1

Note: you can create an environment and add packages do it in one step with: conda create -n iqtree-v1 iqtree=1 

Using an environment in your job file

When submitting a job that uses a specific conda environment, you need to use source activate ... after you load your miniconda module.

Example script:

...
# ----------------Modules------------------------- #
module load ~/modulefiles/miniconda
source activate iqtree-v1
#
# ----------------Your Commands------------------- #
...

Removing an environment

To remove an environment and all of its packages:

(warning) You have to deactivate the environment before removing it

(iqtree-v1)$ conda deactivate
(base)$ conda remove -n iqtree-v1 --all

Remove all packages in environment /home/user/miniconda3/envs/iqtree-v1:

## Package Plan ##

  environment location: /home/user/miniconda3/envs/iqtree-v1

The following packages will be REMOVED:
...
Proceed ([y]/n)? y

Useful environments: python2 and R

If you need to a use a program written in python2, you can create an environment with that version of Python

(base)$ conda create -n python2 python=2
(base)$ source activate python2

You can use conda to install R and R packages

(base)$ conda create -n tidyverse r r-tidyverse
(base)$ source activate tidyverse

When there isn't a conda package for a pipeline

Some pipelines don't have a conda package with all the requirements. You can use the program's documentation to install dependencies. 

For example the Hybpiper pipeline doesn't have a conda package (you could create one and contribute it to bioconda...), but The install instructions list these requirements:

  • Python 2.7 or later [We suggest python3 when possible -Hydra team]
  • BIOPYTHON 1.59 or later
  • EXONERATE
  • BLAST command line tools
  • SPAdes
  • GNU Parallel
  • BWA
  • samtools

Use conda to install the dependencies for HybPiper in your hybpiper environment. You may need to look up package names at anaconda.org or your favorite search engine. (Hint: pay special attention to GNU Parallel)

(base)$ conda create -n hybpiper install python=3 "biopython>1.59" exonerate blast spades parallel bwa samtools

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/user/miniconda3/envs/hybpiper

  added / updated specs:
  - biopython[version='>1.59']
  - blast
  - bwa
  - exonerate
  - parallel
  - python=3
  - samtools
  - spades

The following packages will be downloaded:
...
The following NEW packages will be INSTALLED:
...
Proceed ([y]/n)? y
...
Downloading and Extracting Packages
...

Now that the dependencies are installed, you can download the scripts for HybPiper:

cd ~
git clone https://github.com/mossmatters/HybPiper.git

(star) When you're done using an environment, you can go back to (base) with: conda deactivate

4.Troubleshooting

Finding online help:

Reinstalling

  • mv ~/miniconda3 ~/miniconda3-old (if you want to keep the old install) or rm -rf ~/miniconda3 (to delete the old install)
  • rm -rf .condarc .conda
  • Edit ~/.bashrc
    • delete lines from # >>> conda initialize >>> to # <<< conda initialize <<<
  • logout/back in

5. Advanced

  • Installing packages using pip if not available on conda
  • Exporting environment as environment.yml, and creating a simplified version for working on local computer
    • export: conda env export > environment.yml
    • import: conda env create -f environment.yml
    • This is how qiime2 install works

6. Appendix

Definitions

  • conda: open source package management tool
  • Anaconda®: commercial company that develops conda
  • Anaconda: distribution of conda with many data science tools pre-bundled
  • Miniconda: minimal distribution of conda, other packages can be added
  • Channel: grouping of packages managed by one group (e.g. bioconda, conda-forge)

Channels

Major conda channels:

  • Main: built and maintained by anaconda (setup by default)
  • R: R and packages for it (setup by default)
  • Conda-Forge: community contributions
  • Bioconda: contributions from biological community, find packages on https://bioconda.github.io

7. Additional commands

Remove packages

To remove a package from the current environment use conda remove [package] 

Update packages

To update a specific package use conda update [package]. To update all packages in the current environment use: conda update --all 

You can prevent conda update --all from updating some packages using these instructions



Last updated MPK/MT/SGK

  • No labels