Python

Introduction

Python is free software for computing and graphics used heavily in the AI/ML space.

Availability

Python is available on all clusters in all queues (partitions) through Python modules, Anaconda modules or Singularity containers.

Interface

There are two types of environments in which the python application can be used on ARC resources:

  • Graphical interface via OnDemand using Jupyter

  • Command-line interface. You can also start python from the command line after loading the required software module.

Note

Larger computations should be submitted as jobs, via a traditional job submission script.

Managing environments

The power of python is through extension of the base functionality via python packages. Managing and configuring your local python environment is best accomplished through a combination of a package manager (pip or conda) and an evironment manager like or conda, mamba, or virtual environments. Creation and use of environments allows one create an environment, add packages with automatic dependency resolution, and the ability to reuse the environment in later sessions. You can have several environments, each with different software packages installed, where you activate only the environment you need. Commonly, you will create a conda env, activate it for use, and install software into it via conda or pip. For example:

module load Miniconda3
conda create -n mypy3 python=3.10 pip 
source activate mypy3
conda install ipykernel
pip install plotly kaleido

Source-activating the environment ensures later conda or pip installs will install into the environment location.

Running without environments

If you prefer to use python without an environment, you will need to set the PYTHONUSERBASE environment variable to a location you can write to. For example:

#load a python module
module reset; module load Python/3.8.6-GCCcore-10.2.0
#give python a directory where it can install/load personalized packages
#you may want to make this more specific to cluster/node type/python version
export PYTHONUSERBASE=$HOME/python3
#install a package (--user tells python to install to the location 
#specified by PYTHONUSERBASE)
pip install --user plotly

Command line running of Python scripts

First, we need both a python script and (likely) the conda environment setup. The environment for this example was shown above as mypy3.

## violins.py
import plotly.express as px 
# using the tips dataset
df = px.data.tips() 
# plotting the violin chart
fig = px.violin(df, x="day", y="total_bill")
fig.write_image("fig1.jpeg")

Second, we need a shell script to submit to the Slurm scheduler. The script needs to specify the required compute resources, load the required software and finally run the actual script.

#!/bin/bash

### python.sh
###########################################################################
## environment & variable setup
####### job customization
#SBATCH -N 1
#SBATCH -n 16
#SBATCH -t 1:00:00
#SBATCH -p normal_q
#SBATCH -A <your account>
####### end of job customization
# end of environment & variable setup
###########################################################################
#### add modules:
module load Anaconda/2020.11
module list
#end of add modules
###########################################################################
###print script to keep a record of what is done
cat python.sh
echo "python code"
cat violins.py
###########################################################################
echo start load env and run python

source activate mypy3
python violins.py

exit;

Finally, to run both the batch script and python, we type:

sbatch python.sh

This will output a job number. You will have two output files:

  • fig1.jpeg

  • slurm-JOBID.log

The slurm log contains any output you would have seen had you typed python violins.py at the command line.

Parallel Computing in Python

Coming soon-ish. In the meantime, an mpi4py example is provided as part of ARC’s examples repository.

ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor
from time import sleep
import numpy as np
def task(message):
   sleep(2)
   for ii in range(100):
      np.gcd(np.random.randint(a),np.random.randint(a))
   return message

def main():
   executor = ProcessPoolExecutor(5)
   future = executor.submit(task, ("Completed"))
   print(future.done())

   sleep(2)
   print(future.done())
   print(future.result())

if __name__ == '__main__':
   main()

MPI4PY on a DGX node

I do not load any system software module for MPI. Everything is provided by Anaconda environment:

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ module list

Currently Loaded Modules:
  1) shared   2) slurm/20.11.9   3) apps   4) useful_scripts   5) site/tinkercliffs/easybuild/setup   6) DefaultModules   7) Anaconda3/2022.05

My conda environment was fairly minimal, but included cupy. cupy version >11.6 (ie. 12.x) appear to require a newer Nvidia device driver that what’s available on DGX nodes as of 2023/06/30, but we regularly update these during maintenance outages. So I specified conda install cupy=11.6.0

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ conda list
# packages in environment at /home/brownm12/env/dgx_cu:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
cupy                      11.6.0          py311hb8138a5_0    conda-forge
fastrlock                 0.8             py311ha362b79_3    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libblas                   3.9.0           17_linux64_openblas    conda-forge
libcblas                  3.9.0           17_linux64_openblas    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgfortran-ng            13.1.0               h69a702a_0    conda-forge
libgfortran5              13.1.0               h15d22d2_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
liblapack                 3.9.0           17_linux64_openblas    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.23          pthreads_h80387f5_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
mpi                       1.0                       mpich    conda-forge
mpi4py                    3.1.4           py311h7edb0b5_0    conda-forge
mpich                     4.1.1              h846660c_100    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
numpy                     1.25.0          py311h64a7726_0    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
python                    3.11.4          hab00c5b_0_cpython    conda-forge
python_abi                3.11                    3_cp311    conda-forge
readline                  8.2                  h8228510_1    conda-forge
setuptools                68.0.0             pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge

Notice that mpi4py provides mpiexec. It works, but has minimal features because it comes from MPICH which is a minimal MPI. Slurm’s srun seems to be compatible with mpi4py and provides more features

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ which mpiexec
~/env/dgx_cu/bin/mpiexec

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ which srun
/cm/shared/apps/slurm/20.11.9/bin/srun

Python script to execute basic MPI collective communication function Source: https://mpi4py.readthedocs.io/en/stable/tutorial.html#point-to-point-communication

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ cat mpi_scatter.py
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

if rank == 0:
    data = [(i+1)**2 for i in range(size)]
else:
    data = None
data = comm.scatter(data, root=0)
assert data == (rank+1)**2

print('rank:', rank, 'data:', data)

My job’s allocation had had --ntasks-per-node=32 --gres=gpu:2 on a DGX node. srun uses a subset here

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ srun --ntasks=8 python mpi_scatter.py
rank: 0 data: 1
rank: 1 data: 4
rank: 2 data: 9
rank: 3 data: 16
rank: 5 data: 36
rank: 6 data: 49
rank: 4 data: 25
rank: 7 data: 64

Using cupy here to get gpu device attributes

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ cat mpi_cupy.py
from mpi4py import MPI
import cupy as cp

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

dev = cp.cuda.Device(rank)
print("rank:",rank,'bus_id:', dev.pci_bus_id)
print(dev.mem_info)

I needed custom gpu-binding to get the separate MPI ranks to “see” different GPU devices

(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ srun --ntasks=2 --gpu-bind=single:1 python mpi_cupy.py
rank: 0 bus_id: 0000:0F:00.0
(84732477440, 85198045184)
rank: 0 bus_id: 0000:07:00.0
(84732477440, 85198045184)