CUDA
Concepts
CUDA kernel
Devices
CPU = host CPU memory = system memory or host memory
GPU = device GPU memory = device memory Code running on GPU = device code Function executed on GPU = kernel
Code starts on the host. CUDA APIs copy data from host memory to device memory for computations on the device, then copy the data back to the host memory.
| Host | CPU |
| Current device |
CUDA for Python
Low-level wrappers
PyCUDA: an older, community-driven wrapper.
CUDA Python: a newer wrapper by NVIDIA.
For new projects and unless you are already familiar with PyCUDA, favour CUDA Python.
High-level for NumPy
CuPy is a high-level drop-in replacement for NumPy.
High-level for NumPy with JIT
Numba is a NumPy optimizing using the just-in-time (JIT) compiler LLVM.
The numba-cuda by NVIDIA allows to run Numba on GPUs using CUDA.
High-level for DataFrames
RAPIDS cuDF
High-level for ML
RAPIDS cuML
High-level for DL
JAX, PyTorch, TensorFlow