Conda is an open source package management system.
Conda: User Guide
Loading and Initializing Conda
Before using Conda, the following two steps are required. Conda must be initialized, and a .condarc must be created.
1) Initialize Conda environment:
This step is necessary only once for an HPC user, unless the initialization settings are removed.
To load the system installed Conda, load the module and use init to add it to the path. Normally using a login file to automatically set the environment is strongly discouraged, but in the case of Conda, many features cannot be used without setting this initialization file. Log out and then back in again after using conda init.
module load conda
conda init tcsh
[log out then back in again]
Optional - remove old environments:
For users who have already been using different Conda environments and would like to begin installing with the new recommended procedures, clean out the remnants of old Conda environments by doing the following.
Check for these 'dot' files:
If these files contain information for old Conda environments, edit the files and delete this section:
# >>> conda initialize >>>
# >>> conda initialize >>>
2) Create a .condarc file:
This step is mandatory: Conda will fill the quota of the home directory if pkgs_dirs is not set in this file.
By default, Conda stores packages in the /home directory. The /home directory is too small for that, and the packages are only needed temporarily. They should not be saved, taking up space in permanent directories. To change the default location to /share, use a text editor to create a file called .condarc. The path to that file should be /home/$USER/.condarc and it should contain the path to the alternative location, e.g.:
In addition, many packages require adding a 'channel'. Common channels may be added before creating environments, either by using the
conda config command, or by editing the .condarc.
The following displays a sample .condarc file:
[unityid@login01 ~]$ cd
[unityid@login01 ~]$ more .condarc
Installing and activating a Conda environment
Before installing any software, including a Conda environment, request a space for user maintained software to be used by all members of a Project. The path for that space is generally /usr/local/usrapps/groupname.
To install Conda environments, specify a prefix, which will be the path to where the environment will be installed. Choose a descriptive name for the environment - Conda will create the directory, the directory should not already exist. For example, to create a Conda environment called env_ABC containing the packages AAA, BBB, and CCC, and install it in the directory /usr/local/usrapps/[your_path], do :
conda create --prefix /usr/local/usrapps/[your_path]/env_ABC AAA BBB CCC
To activate the environment, do:
conda activate /usr/local/usrapps/[your_path]/env_ABC
Once in a Conda environment, a user can install additional packages using either conda install or pip install (after doing conda install pip); however, this is not recommended, as it is harder when doing so to maintain an environment where all software is compatible.
Best practice is to create a YAML file with all of the desired Conda packages. Conda will 'solve' the environment, that is, it will find a configuration where all desired packages are the correct version numbers to work together, assuming such a configuration exists. If a user needs a version of one software that is not compatible with another, then they would create two different Conda environments.
To create a Conda environment from a YAML file called ABC.yml, do
conda env create --prefix /usr/local/usrapps/[your_path]/env_ABC -f ABC.yml
The YAML file will contain a name, a list of Conda channels to look for the packages, and a list of all the desired packages.
Here are some sample YAML files:
- datascience.yml - Contains many common data science programs
- sklearn.yml - Machine learning with scikit-learn
- biotools.yml - Contains applications for a bioinformatics workflow
- ncdfutil.yml - Used in sponsored software group ncdfutil, contains many NetCDF Utilities
- rlibs.yml - Used to create an environment with custom R libraries
When Conda creates an environment, it finds a configuration such that all of the packages/dependencies are compatible. If a great many packages are added to a YAML file, it might be impossible for Conda to resolve the necessary environment. In that case, multiple Conda environments will need to be created.
To deactivate the environment, do:
Running a Conda installed application
Activating a Conda environment sets the compute environment, and is similar to loading a module. Here is a sample batch script that uses an application called mycode that was installed via a Conda environment:
#BSUB -n 1
#BSUB -W 120
#BSUB -J mycode
#BSUB -o stdout.%J
#BSUB -e stderr.%J
conda activate /usr/local/usrapps/mygroup/env_mycode
Warning: multithreading and MPI applications
Multi-threading: Many of the applications available in Conda environments are automatically run in parallel, that is, they may auto-detect the number of cores on the nodes and spawn the same number of tasks. See the following for more information on testing, and contact HPC Staff for further assistance.
Using MPI: Conda is package management system that installs applications and all of their dependencies. That means that if an installed package requires MPI, Conda will install its own MPI. User installed MPI will not work properly with LSF, therefore a user must use the system MPI. See the following documentation for of using system MPI to install R or Python packages requiring MPI.
Tips and Troubleshooting
To list the available packages contained in an activated Conda environment, do
Alternate versions of Python and various packages may be specified by following the package name with =version.number, e.g. matplotlib=3.1.
All installations must be done from a login node. Packages cannot be downloaded from a compute node, and neither a compute node nor an HPC-VCL node can write to /usr/local/usrapps.
Use a YAML file! If you create a Conda environment and then attempt to add something to it, Conda may not be able to reconcile the existing applications and libraries with the new application. This is common in R or Python, when newer versions or libraries only work with newer releases. It also happens when attempting to install an older package into an existing environment. For example, older packages may need Python 2, and therefore cannot be used with a Python 3 environment.
The message will say "solving environment" and it will continue for quite some time before eventually giving up. If this happens, create a new environment that includes the new application. (Keep the old environment!)
Create multiple environments rather than adding to them. Using
conda install on an existing environment may break that environment, resulting in your scripts suddenly not working anymore.
Check your syntax: don't forget either the 'env' or the '-f' in the
conda env create --prefix /path/to/env_ABC -f ABC.yml.
PackagesNotFoundError: If you get this error, first check your syntax. Next, make sure the package exits, and if so, which channel or channels it is available from. Some Python packages are not available through Conda, and some Conda packages are only available through specific channels. You can find this information by searching for the package name on the Anaconda.org search page.
The instructions use 'pip install'. Some packages are only available through GitHub or some other website or collaborator. In this case, go to the website for the package and carefully read the instructions. Create a YAML file with all of the dependencies listed in the instructions, including version numbers when indicated, and also include any additional applications or Python packages that you will use while doing your workflow. After creating and activating the Conda environment from the YAML file, follow the remaining instructions for the "pip install". The packages installed by pip will then be available when you activate that Conda environment.
Python notebooks: If the application includes examples using IPython or Jupyter notebook, they can be tested using the HPC-VCL. See these instructions for requesting access to and using the HPC-VCL.
To add Jupyter notebook to a Conda environment, add 'notebook' to the dependencies in the YAML file:
What is a Conda environment anyway???
Most of the time, Conda installs applications by downloading precompiled binaries from package repositories. Sometimes it downloads source code, and then compiles it on Henry2, in which case other modules or packages may need to be installed or linked - for example, you may need to load the CUDA modules for ML/AI packages.
The transient files (tar balls, source code, etc.) are downloaded to a temporary directory, which was defined in the .condarc in 'pkgs_dirs'. If that is not set, the default is to download to the home directory. The home directory is too small for this, and as these packages are intermediate products, they should not be saved to a directory with limited space or that is backed up. Don't waste permanent storage on it - put it in /share, as shown by the example .condarc file.
After downloading, the applications will be installed to a location defined by '--prefix', which generally should be set to the project's space in /usrapps.
Conda creates the equivalent of a module, and 'conda activate' does the equivalent of 'module load', i.e., it sets variables such as PATH and LD_LIBRARY_PATH to point to the proper locations in '--prefix'. (See more about environment variables here.)
Last modified: October 15 2021 13:04:13.