Python
Introduction
Python is free software for computing and graphics used heavily in the AI/ML space.
Availability
Python is available on all clusters in all queues (partitions) through Python modules, Anaconda modules or Singularity containers.
Interface
There are two types of environments in which the python application can be used on ARC resources:
Graphical interface via OnDemand using Jupyter
Command-line interface. You can also start python from the command line after loading the required software module.
Note
Larger computations should be submitted as jobs, via a traditional job submission script.
Managing environments
The power of python is through extension of the base functionality via python packages. Managing and configuring your local python environment is best accomplished through a combination of a package manager (pip or conda) and an evironment manager like or conda
, mamba
, or virtual environments. Creation and use of environments allows one create an environment, add packages with automatic dependency resolution, and the ability to reuse the environment in later sessions. You can have several environments, each with different software packages installed, where you activate only the environment you need. Commonly, you will create a conda env, activate it for use, and install software into it via conda
or pip
. For example:
module load Miniconda3
conda create -n mypy3 python=3.10 pip
source activate mypy3
conda install ipykernel
pip install plotly kaleido
Source-activating the environment ensures later conda
or pip
installs will install into the environment location.
Running without environments
If you prefer to use python without an environment, you will need to set the PYTHONUSERBASE
environment variable to a location you can write to. For example:
#load a python module
module reset; module load Python/3.8.6-GCCcore-10.2.0
#give python a directory where it can install/load personalized packages
#you may want to make this more specific to cluster/node type/python version
export PYTHONUSERBASE=$HOME/python3
#install a package (--user tells python to install to the location
#specified by PYTHONUSERBASE)
pip install --user plotly
Command line running of Python scripts
First, we need both a python script and (likely) the conda environment setup. The environment for this example was shown above as mypy3
.
## violins.py
import plotly.express as px
# using the tips dataset
df = px.data.tips()
# plotting the violin chart
fig = px.violin(df, x="day", y="total_bill")
fig.write_image("fig1.jpeg")
Second, we need a shell script to submit to the Slurm scheduler. The script needs to specify the required compute resources, load the required software and finally run the actual script.
#!/bin/bash
### python.sh
###########################################################################
## environment & variable setup
####### job customization
#SBATCH -N 1
#SBATCH -n 16
#SBATCH -t 1:00:00
#SBATCH -p normal_q
#SBATCH -A <your account>
####### end of job customization
# end of environment & variable setup
###########################################################################
#### add modules:
module load Anaconda/2020.11
module list
#end of add modules
###########################################################################
###print script to keep a record of what is done
cat python.sh
echo "python code"
cat violins.py
###########################################################################
echo start load env and run python
source activate mypy3
python violins.py
exit;
Finally, to run both the batch script and python, we type:
sbatch python.sh
This will output a job number. You will have two output files:
fig1.jpeg
slurm-JOBID.log
The slurm log contains any output you would have seen had you typed python violins.py
at the command line.
Parallel Computing in Python
Coming soon-ish. In the meantime, an mpi4py example is provided as part of ARC’s examples repository.
ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor
from time import sleep
import numpy as np
def task(message):
sleep(2)
for ii in range(100):
np.gcd(np.random.randint(a),np.random.randint(a))
return message
def main():
executor = ProcessPoolExecutor(5)
future = executor.submit(task, ("Completed"))
print(future.done())
sleep(2)
print(future.done())
print(future.result())
if __name__ == '__main__':
main()
MPI4PY on a DGX node
I do not load any system software module for MPI. Everything is provided by Anaconda environment:
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ module list
Currently Loaded Modules:
1) shared 2) slurm/20.11.9 3) apps 4) useful_scripts 5) site/tinkercliffs/easybuild/setup 6) DefaultModules 7) Anaconda3/2022.05
My conda environment was fairly minimal, but included cupy. cupy
version >11.6 (ie. 12.x) appear to require a newer Nvidia device driver that what’s available on DGX nodes as of 2023/06/30, but we regularly update these during maintenance outages. So I specified conda install cupy=11.6.0
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ conda list
# packages in environment at /home/brownm12/env/dgx_cu:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2023.5.7 hbcca054_0 conda-forge
cudatoolkit 11.8.0 h37601d7_11 conda-forge
cupy 11.6.0 py311hb8138a5_0 conda-forge
fastrlock 0.8 py311ha362b79_3 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libblas 3.9.0 17_linux64_openblas conda-forge
libcblas 3.9.0 17_linux64_openblas conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgfortran-ng 13.1.0 h69a702a_0 conda-forge
libgfortran5 13.1.0 h15d22d2_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
liblapack 3.9.0 17_linux64_openblas conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.23 pthreads_h80387f5_0 conda-forge
libsqlite 3.42.0 h2797004_0 conda-forge
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
mpi 1.0 mpich conda-forge
mpi4py 3.1.4 py311h7edb0b5_0 conda-forge
mpich 4.1.1 h846660c_100 conda-forge
ncurses 6.4 hcb278e6_0 conda-forge
numpy 1.25.0 py311h64a7726_0 conda-forge
openssl 3.1.1 hd590300_1 conda-forge
pip 23.1.2 pyhd8ed1ab_0 conda-forge
python 3.11.4 hab00c5b_0_cpython conda-forge
python_abi 3.11 3_cp311 conda-forge
readline 8.2 h8228510_1 conda-forge
setuptools 68.0.0 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
Notice that mpi4py
provides mpiexec
. It works, but has minimal features because it comes from MPICH which is a minimal MPI. Slurm’s srun
seems to be compatible with mpi4py
and provides more features
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ which mpiexec
~/env/dgx_cu/bin/mpiexec
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ which srun
/cm/shared/apps/slurm/20.11.9/bin/srun
Python script to execute basic MPI collective communication function Source: https://mpi4py.readthedocs.io/en/stable/tutorial.html#point-to-point-communication
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ cat mpi_scatter.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
if rank == 0:
data = [(i+1)**2 for i in range(size)]
else:
data = None
data = comm.scatter(data, root=0)
assert data == (rank+1)**2
print('rank:', rank, 'data:', data)
My job’s allocation had had --ntasks-per-node=32 --gres=gpu:2
on a DGX node. srun
uses a subset here
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ srun --ntasks=8 python mpi_scatter.py
rank: 0 data: 1
rank: 1 data: 4
rank: 2 data: 9
rank: 3 data: 16
rank: 5 data: 36
rank: 6 data: 49
rank: 4 data: 25
rank: 7 data: 64
Using cupy
here to get gpu device attributes
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ cat mpi_cupy.py
from mpi4py import MPI
import cupy as cp
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
dev = cp.cuda.Device(rank)
print("rank:",rank,'bus_id:', dev.pci_bus_id)
print(dev.mem_info)
I needed custom gpu-binding to get the separate MPI ranks to “see” different GPU devices
(/home/brownm12/env/dgx_cu) [brownm12@tc-dgx001 tc]$ srun --ntasks=2 --gpu-bind=single:1 python mpi_cupy.py
rank: 0 bus_id: 0000:0F:00.0
(84732477440, 85198045184)
rank: 0 bus_id: 0000:07:00.0
(84732477440, 85198045184)