Research Computing – Bayesian inference in

Algorithms

A Bayesian approach to statistics often leads to posterior probability distributions that are too complex or too highly dimensional to be studied by analytical techniques

Markov chain Monte Carlo (MCMC) is a class of sampling algorithms which explore such distributions

Different algorithms move in different ways across the N-dimensional space of the parameters, accepting or rejecting each new position based on its adherence to the prior distribution and the data

The sequence of accepted positions constitute the traces

Probabilistic Programming Language

Probabilistic programming language (PPL), explained simply in this (a bit outdated) blog post, are computer languages specialized in creating probabilistic models and making inference

Model components are first-class primitives

They can be based on a general programming language (e.g. Python, Julia) or domain specific

First Bayesian PPLs

Relied on Gibbs sampling:

WinBUGS replaced by OpenBUGS, written in Component Pascal
JAGS, written in C++

BUGS = Bayesian inference Using Gibbs Sampling
JAGS = Just Another Gibbs Sampler

Stan

Stan (see also website and paper) is a domain-specific language

Stan scripts can be executed from R, Python, or the shell via RStan, PyStan, etc.

Also used as the backend for the R package brms which doesn’t require learning Stan but only works for simple models

Relies on No-U-Turn sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC) (see also HMC paper)

HMC and variants require burdensome calculations of derivatives. Stan solved that by creating its own reverse-mode automatic differentiation engine

Superior to Gibbs sampler ➔ made Stan a very popular PPL for years

PPLs based on deep learning frameworks

Since HMC and NUTS require autodiff, many Python PPLs have emerged in recent years, following the explosion of deep learning

Examples:

Pyro based on PyTorch
Edward, then Edward2 as well as TensorFlow Probability based on TensorFlow

Enters JAX

Had JAX existed when we started coding Stan in 2011, we would’ve used that rather than rolling our own autodiff system.

Bob Carpenter, one of Stan’s creators, in a recent blog post

What is JAX?

JAX is a library for Python that:

makes use of the extremely performant XLA compiler
runs on accelerators (GPUs/TPUs)
provides automatic differentiation
uses just-in-time compilation
allows batching and parallelization

⇒ perfect tool for Bayesian statistics

See our introductory JAX course and webinar for more details

JAX idiosyncrasies

JAX is sublanguage of Python requiring pure functions instead of Python’s object-oriented style

It has other quirks

The only one you really need to understand for use in PPLs is the pseudorandom number generation

PRNG keys

Traditional pseudorandom number generators are based on nondeterministic state of the OS. This is slow and problematic for parallel executions

JAX relies on an explicitly-set random state called a key:

from jax import random
key = random.key(18)

Each key can only be used for one random function, but it can be split into new keys:

key, subkey = random.split(key)

The first key can’t be used anymore. We overwrote it with a new key to ensure we don’t accidentally reuse it

We can now use subkey in random functions in our code (and keep key to generate new subkeys as needed)

JAX use cases

New JAX backends added to many PPLs

Edward2 and TensorFlow Probability can now use JAX as backend

PyMC relies on building a static graph. It is based on PyTensor which provides JAX compilation (PyTensor is a fork of aesara, itself a fork of Theano)

NumPyro

NumPyro is a library based on Pyro but using NumPy and JAX

Blackjax

Not a PPL but a library of MCMC samplers built on JAX

Can be used directly if you want to define your own log-probability density functions or can be used with several PPLs to define your model (make sure to translate it to a log-probability function)

Also provides building blocks for experimentation with new algorithms

Blackjax

Example Blackjax sampler: HMC

Example Blackjax sampler: NUTS

Which tool to choose?

All these tools are in active development (JAX was released and started shaking the field in 2018). Things are fast evolving. Reading blogs of main developers, posts on Hacker News, discourse forums, etc. helps to keep an eye on evolutions in the field

This recent conversation between Bob Carpenter (Stan core developer) and Ricardo Vieira (PyMC core developer) in the PyMC discourse forum is interesting

A lot of it also comes down to user preferences

Bayesian inference in

On probabilities

Two interpretations of probabilities

Frequentist

Bayesian

Which approach to choose?

Bayesian computing

Algorithms

Probabilistic Programming Language

First Bayesian PPLs

Stan

PPLs based on deep learning frameworks

Enters JAX

What is JAX?

JAX idiosyncrasies

PRNG keys

JAX use cases

New JAX backends added to many PPLs

NumPyro

Blackjax

Blackjax

Example Blackjax sampler: HMC

Example Blackjax sampler: NUTS

Which tool to choose?

Resources

How to get started with Bayesian computing?

How to transition from Stan to a JAX-based PPL?