Using Anaconda on ARC systems
Use recent versions of Anaconda
ARC will keep older versions of software packages available on clusters, but we recommend using more recent packages where available. This is particularly true of Anaconda because the base code continues to evolve in functionality and integration with package repositories. There are many cases where current/recent environments are impossible to create or update when using older versions of Anaconda.
Use module spider anaconda
to search our module system for the most recent Anaconda available on the system you’re using.
Do not run conda init
Running conda init
is a convenience for managing Anaconda virutal environments on a single computer, but it does not produce portable results. The principle action of conda init
seems to be do add lines like this to the user’s BASH startup script ~/.bashrc
:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/etc/profile.d/conda.sh" ]; then
. "/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/etc/profile.d/conda.sh"
else
export PATH="/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
Notably, you can see that explicit references are made to paths which are specific to a particular node type on ARC systems, like /apps/easybuild/software/tinkercliffs-rome/...
. Such a path only exists on one node type on Tinkercliffs and will fail on a different type of Tinkercliffs node or any other cluster’s nodes. In short, conda init
produces non-portable results and so we recommend not to use it.
Use source activate
and do not use conda activate
Use of conda activate <envname>
requires the Anaconda initialization from above, and so is not designed to work on systems where a single home directory is shared between several different nodes. Instead, use source activate <envname>
to activate Anaconda virtual environments.
Create a virtual environment specifically for the type of node where it will be used
The Tinkercliffs cluster has at least three different node types and so does Infer. Each node type is equipped with a different cpu micro-architecture, slightly different operating system and/or kernel versions, slightly different system configuration and packages. All are tuned to be customized and efficient for the particular node features. These system differences can make Anaconda virtual environments non-portable between node types.
As a result, you should create and build a virtual environment on a node of the type where you will use the environment.
Example 1:
The Tinkercliffs login nodes are essentially identical to normal_q
partition nodes. So if you wish to use Anaconda for jobs on normal_q
nodes, you can build the environment on the Tinkerliffs login nodes OR on the normal_q
nodes. But you should not use an environment which was built on another cluster (eg. Cascades or Infer). Instead, use conda list
to view and document the most important packages and versions in the environment and build a new environment for the Tinkercliffs normal_q
matching those specifications.
Example 2:
If you want to use Anaconda on Tinkercliffs a100_normal_q
nodes, then you need to build the environment from a shell on those nodes.
The important commands for this are:
command |
purpose |
---|---|
|
get an interactive command line shell on a compute node |
|
search for the latest anaconda module |
|
load a module |
|
create a new anaconda environment at the provided path |
|
activate the newly created environment |
|
install packages into the environment |
Note
$HOME “expands” in the shell to your home directory, eg. /home/jdoe2
. And envname
from above should be a short but meaninful name for the environment. Since they are particular to the node type, it is recommended to reference the node type in the name. For example tca100-science
or tcnq
for Tinkercliffs a100_normal_q
nodes or Tinkercliffs normal_q
nodes respectively.
[jdoe2@tinkercliffs2 ~]$ interact --partition=a100_normal_q --nodes=1 --ntasks-per-node=4 --gres=gpu:1 --account=jdoeacct
srun: job 438605 queued and waiting for resources
srun: job 438605 has been allocated resources
[jdoe2@tc-gpu004 ~]$ module spider anaconda
module
---------------------------------------------------------------------------------------------------------------
Anaconda3:
---------------------------------------------------------------------------------------------------------------
Description:
Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies
to adopt a modern open data science analytics architecture.
Versions:
Anaconda3/2020.07
Anaconda3/2020.11
---------------------------------------------------------------------------------------------------------------
For detailed information about a specific "Anaconda3" package (including how to load the modules) use the module's full name.
Note that names that have a trailing (E) are extensions provided by other modules.
For example:
$ module spider Anaconda3/2020.11
---------------------------------------------------------------------------------------------------------------
[jdoe2@tc-gpu004 ~]$ module load Anaconda3/2020.11
[jdoe2@tc-gpu004 ~]$ conda create -p ~/env/a100_env
Collecting package metadata (current_repodata.json): done
Solving environment: done
Please update conda by running
$ conda update -n base conda
## Package Plan ##
environment location: /home/jdoe2/env/a100_env
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
[jdoe2@tc-gpu004 ~]$ source activate /home/jdoe2/env/a100_env/
(/home/jdoe2/env/a100_env) [jdoe2@tc-gpu004 ~]$ conda install python=3.9 pandas
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.10.3
latest version: 4.12.0
Please update conda by running
$ conda update -n base conda
## Package Plan ##
environment location: /home/jdoe2/env/a100_env
added / updated specs:
- pandas
- python=3.9
Using kernels with a Conda Environment
You can use a Jupyter kernel to use a conda environment inside a Jupyter notebook. Each kernel can be used to run different cells according to its language/package requirements. For example, if you have a notebook that uses two different sets of packages where each set is installed in a different conda environment, then you can use Jupyter kernels to switch between those two sets of packages.
To create a kernel that is associated with a specific conda environment, you need to load the Anaconda module, activate the conda environment, and create a kernel inside that environment:
[jdoe2@tinkercliffs2 ~]$ module load Anaconda3
[jdoe2@tinkercliffs2 ~]$ source activate torch_env
(torch_env) [jdoe2@tinkercliffs2 ~]$ ipython kernel install --user --name=torch_env_kernel
Installed kernelspec torch_env_kernel in ~/.local/share/jupyter/kernels/torch_env_kernel
Then, when launching the Jupyter interactive app from Open OnDemand and opening a notebook, select the code cell that you wish to run using the kernel created before. From the top menu, select Kernel -> Change kernel -> torch_env_kernel, then execute your cell.
GPU - Cuda compatability
While nvidia-smi
will display a version of CUDA, this is just the base CUDA on the node and can be overridden by
loading a different CUDA module:
module spider cuda
activating an Anaconda environment which has cudatoolkit
conda list cudatoolkit
installing a conda package built with a different cuda:
conda list tensorflow
-> check the build string
A100 GPUs require CUDA 11.0 or greater
Check CUDA version in Tensorflow
import tensorflow as tf
sys_details = tf.sysconfig.get_build_info()
cuda_version = sys_details["cuda_version"]
print(cuda_version)
Check cuDNN version in TensorFlow
cudnn_version = sys_details["cudnn_version"]
print(cudnn_version)
Check CUDA version in PyTorcb
torch.version.cuda