CUDA

Author

Marie-Hélène Burle

Concepts

CUDA kernel

CUDA

Devices

the CUDA programming guide

CPU = host CPU memory = system memory or host memory

GPU = device GPU memory = device memory Code running on GPU = device code Function executed on GPU = kernel

Code starts on the host. CUDA APIs copy data from host memory to device memory for computations on the device, then copy the data back to the host memory.

Host CPU
Current device

CUDA for Python

Low-level wrappers

PyCUDA: an older, community-driven wrapper.
CUDA Python: a newer wrapper by NVIDIA.

For new projects and unless you are already familiar with PyCUDA, favour CUDA Python.

High-level for NumPy

CuPy is a high-level drop-in replacement for NumPy.

High-level for NumPy with JIT

Numba is a NumPy optimizing using the just-in-time (JIT) compiler LLVM.

The numba-cuda by NVIDIA allows to run Numba on GPUs using CUDA.

High-level for DataFrames

RAPIDS cuDF

High-level for ML

RAPIDS cuML

High-level for DL

JAX, PyTorch, TensorFlow