Ulam - Elixir interface to the Stan probabilistic programming language

Ulam

Elixir interface to Stan, inspired by the Python project CmdStanPy. Why should Python programmers have all the Bayesian fun?

Why?

I’m interested in developping probabilistic programming languages that compile to Stan. I have found that doing so in Python is pretty inconvenient. I have decided to give Elixir a try to see how far I can go.

Installation

The package must be installed from GitHub. It’s not currently stable enough to be uploaded to Hex.

Examples

Se the example in the tests. Relevant code:

alias Ulam.Stan.StanModel

# Some simple data for the model
data = %{
  N: 10,
  y: [0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
}

# Compile the model from the stan program file
model = StanModel.compile_file("test/stan/models/bernoulli/bernoulli.stan")

# Sample from the model and save it in a dataframe
dataframe =
  StanModel.sample(model, data,
    nr_of_samples: 1000,
    nr_of_warmup_samples: 1000,
    nr_of_chains: 8,
    show_progress_bars: false
  )

Is this a good idea?

Of course not, but if Elixir can have classical machine learning with Nx for big data, why can’t it have cool Bayesian model fitting for “small data”?

7 Likes

Kudos for the name

For anyone interested, because the Rust toolchain is much friendlier than the C++ build chain (which is required by stan, as stan effectively writes and compiles a C++ program for each model), I’m working on trying to become intependent of stan and implement a sampler based on this rust project: GitHub - pymc-devs/nuts-rs: A implementation of NUTS in rust

The sampler seems to be at least as good as the stan sampler according to some benchmarks.

The idea would be to compile the model into a rust logp minimization program, using the above rust library and use Rustler to compile it into a NIF that can interact with Elixir. Bonus points if I can make it work without generating any rust source files.

The hard par in this is to write vectorized implementation of the functions that will make up our likelihood, probably in Elixir (if we don’t vectorize, if we implement everything directly according to the mathematical definitions the dynamic compute graph will blow up like crazy) and implement the gradient calculations (on the rust side)

1 Like

Nice work with Ulam! And xongrats on the impl of Rustler alternative, seems very promising.

Maybe I’m not understanding the problem well enough, but wouldn’t it be possible to use pure Nx for this vectorized hard part? Would it not help?

I don’t know, I never managed to get Nx to work on a CPU backend. And making it work on commercial-grade CPUs is important to me (and probably more performant than GPU for the typical workloads given to Stan)

1 Like

What does the name mean?

The “backend” that does all the smar probabilistic calculations is called Stan. The name Stan comes from the name of the Polish/American mathematician Stanisław Ulam. I have decided to call my “frontend” Ulam based on the same name. There is acually another Ulam project in the R programming language.

3 Likes

I have decided that until I can have something that works practically with Stan (including basic bayesian plots and diagnostics) I won’t pursue the rust sampler option. I can always add that later.

I have added support for energy plots using my Quartz plotting package (this can’t still be reproduced from the versions on GitHub but it shows how easy it can be to make some basic plots):

defmodule Ulam.Demo.InferenceDataPlots.LinearRegression do
  alias Statistics.Distributions.Normal

  alias Quartz.{Figure, Length, Plot2D}

  # We'll need the UlamModel.new/2 macro to compile our models
  require Ulam.UlamModel, as: UlamModel
  alias Ulam.InferenceData

  # Give an explicit file path so that we can inspect the generated stan code
  linear_regression_model_file =
    Path.join(~w[
      demo models inference_data_plots
      linear_regression linear_regression.stan
    ])

  # Compile Elixir AST into stan code
  linear_regression_model =
    UlamModel.new stan_file: linear_regression_model_file do
      data do
        n :: int(lower: 0)
        x :: vector(n)
        y :: vector(n)
      end

      parameters do
        # Parameters for the linear regression
        intercept :: real()
        slope :: real()
        error :: real(lower: 0)
        # Prior on x
        mu_x :: real()
        sigma_x :: real(lower: 0)
      end

      model do
        x <~> normal(mu_x, sigma_x)
        y <~> normal(x * slope + intercept, error)
      end
    end

  # Store the model into a compile-time module attribute.
  # In addition to the compiled struct, the model depends
  # on artifacts stores in the file system.
  @linear_regression_model UlamModel.compile(linear_regression_model)

  # Generative model to simulate the data
  def generate_data(%{} = params) do
    n = 160
    x = for _i <- 1..n, do: Normal.rand(params.mu_x, params.sigma_x)
    y = for x_i <- x, do: Normal.rand(x_i * params.slope + params.intercept, params.error)

    %{x: x, y: y, n: n}
  end

  def energy_plot() do
    params = %{
      mu_x: 1.2,
      sigma_x: 0.7,
      slope: 0.7,
      intercept: 2.8,
      error: 0.15
    }

    # Simulate data
    data = generate_data(params)

    # Run the model in order to get an InferenceData struct with the posterior
    idata = UlamModel.sample(@linear_regression_model, data, show_progress_bars: true)
    %InferenceData{} = idata

    figure =
      Figure.new([width: Length.cm(10), height: Length.cm(10)], fn _fig ->
        [[b1, b2], [b3, b4]] = Figure.bounds_for_plots_in_grid(nr_of_rows: 2, nr_of_columns: 2)

        for {plot_bounds, chain_id} <- Enum.with_index([b1, b2, b3, b4], 1) do
          letter = Enum.at(["A", "B", "C", "D"], chain_id - 1)

          Figure.add_plot_2d(fn plot ->
            plot
            |> InferenceData.plot_energy_for_chain(idata, chain_id)
            |> Plot2D.set_bounds(plot_bounds)
            |> Plot2D.put_title("#{letter}. Chain ##{chain_id}")
          end)
        end
      end)

    Figure.render_to_png_file!(figure, "examples/inference_data_plots/energy_plot.png")
  end
end

The code above defines the model, simulates dummy data according to the generative model and fits the Ulam model to that data.

When you run the plot_energy() function you get the following figure:

This is a high-resolution PNG file, rendered by resvg which renders everything on the CPU and claims to be reproducible across devices (I haven’t checked this, actually) which is a desirable property if you’re going to generate scientific plots. I future versions I’d like to be able to generate an accessible PDF or SVG, but that would require embedding fonts in SVGs or PDFs, which is not actually that easy.

2 Likes

I think that your question reveals a common misconception, which is useful to clear, so I will expand a bit on my answer.

wouldn’t it be possible to use pure Nx for this vectorized hard part?

First, the main benefit of Nx is no necessarily vectorization, but automatic differentiation. When you’re implementing Hamiltonian Montecarlo using NUTS (No U-Turn Sampler), you need to work with the first order (multidimensional) gradient of the function that defines the likelihood. A gradient is multidimensional drivative made up of N components, where N is the number of parameter of the model In very simple terms, you can think of NUTS as a fancy way of minimizing an arbitrary likelihood function R x N to R. This means that any “framework” you use to implement NUTS needs to be able to evaluate gradients of arbitrary functions. This requires some form of automatic differentiation. Nx provides automatic differentiation, which would make it useful here. However, even when you’re running NUTS you need the whole NUTS algorithm and not only the function and gradient. Maybe I could implement NUTS on top of Nx (I don’t know because I have no knowledge on the insides of the NUTS algorithm), but even if that’s possible, it’s not something I’m eager to do.

Regarding vectorization: although vectorization may speed up the kinds of numerical computing you need for a probabilistic model, what you actually need is a fast strogly typed language such as C++ or Rust. If the programming language you’re using is fast, you don’t actually need to vectorize anything, although vectorization may make some operations somewhat faster.

If that is the case, why do we talk so much about vectorization in Elixir (or Python) then? Because Elixir is terribly slow, way too slow for real-world numerical computing. This is mostly because the BEAM (the virtual machine Elixir runs on) needs to check the type of each argument to every function each time it does anything. If you’re using a fast low-level language such as C, Rust or C++, the runtime doesn’t need to do it and everything is orders of magnitude faster.

In the Elixir world, vectorization is just a way to replace slow Elixir code by fast C code, an should be thought in that way. This also means that the common advice you hear such as “don’t use array indexing, use multidimensional array operations” is only true because such operations would be slow in Elixir and when you use the standard array operations everything is handled by fast C code. When you’re already programming in C or Rust, using array indexing is often the fastest way to do whatever you want to do!

Why doesn’t vectorization help much in many real world probablistic models? Vectorization shines when you’re doing things such as multiplying large matrices, summing large arrays or performing vector cross products. But most of the time your models won«t be doing it, unless you’re fitting some really basic models such as generalized linear models or something equivalent to that. As soon as you want to use if statements or array indexing operations (which are pretty much required to handle missing data), vectorization won’t lead to a meaningful speed up (you can read about it in the Stan docs)

In summary:

  1. If your model is implemented in Rust, vectorization isn’t such a big deal for real-world models
  2. Nx isn’t a magic way to speed up your code, although it can help with vectorization
  3. Probabilistic programminbg with NUTS is not easy, and you probably don’t want to implement the NUTS algorithms yourself

The only way I’m considering doing it in Rust is because someone else already implemented the NUTS algortihm in Rust and as a user you only need to write the model. The model is simply a function which returns something like {log_likelihood_value, gradient = [g1, g2, g3, ..., gn]} where the log_likelihood_value is the value of the log-likelihood function and gradient is an n-dimensional vector representing the gradient at that point (n-dimensional) point.

How do I get the gradient value? I use a Rust automatic differentiation library (again, written by someone else, although I have customized it a bit) that automatically evaluates the gradient from almost normal Rust code.

The elixir side only has to generate the rust code for the likelihood function.

The main idea here is that vectorization is not that important, and the main reason people think it is important is because they use dynamic languages such as Elixir in which everything vectorized is fast and everything non-vectorized is slow. I suggest everyone reading this think of vectorization as “running fast code written in C” instead of some undefined procedure which makes your Elixir code faster

1 Like

I have decided to rework the architecture of the plotting functions in order to make things easier for the common cases. In particular, those functions will take care of setting the bounds of the plots when used inside a figure. If you need more control over the plot bounds, you can customize easily (let’s say you want to draw both energy plots and posterior plots in the same figure, or that you want to change the title of the plots).

Some examples follow. The energy plots really need a label to tell the viewer which of the datasets is the energy transitions and which one is the energy deviations, but I haven’t implemented general labels yet and I don’t have the underlying abstractions done.

Energy plots as histograms:

Energy plots as KDEs:

Posterior plots as histograms (one color per chain):

Posterior plots as KDEs (one color per chain):

The code for the above is much simpler:

  def energy_plots() do
    idata = InferenceData.from_parquet!("demo/models/inference_data_plots/linear_regression/linear_regression.parquet")

    figure_histogram =
      Figure.new([width: Length.cm(10), height: Length.cm(10)], fn _fig ->
        InferenceData.draw_energy_plots_for_chains(idata, plot_type: :histogram)
      end)

    figure_kde =
      Figure.new([width: Length.cm(10), height: Length.cm(10)], fn _fig ->
        InferenceData.draw_energy_plots_for_chains(idata, plot_type: :kde)
      end)

    Figure.render_to_png_file!(figure_histogram, "examples/inference_data_plots/energy_plot_histogram.png")
    Figure.render_to_png_file!(figure_kde, "examples/inference_data_plots/energy_plot_kde.png")
  end

  def posterior_plots() do
    idata = InferenceData.from_parquet!("demo/models/inference_data_plots/linear_regression/linear_regression.parquet")

    figure_histogram =
      Figure.new([width: Length.cm(6), height: Length.cm(18), dbug: true], fn _fig ->
        _plots = InferenceData.draw_posterior_plots(idata, plot_type: :histogram)
      end)

    figure_kde =
      Figure.new([width: Length.cm(6), height: Length.cm(18), dbug: true], fn _fig ->
        _plots = InferenceData.draw_posterior_plots(idata, plot_type: :kde)
      end)

    Figure.render_to_png_file!(figure_histogram, "examples/inference_data_plots/posterior_plots_histogram.png")
    Figure.render_to_png_file!(figure_kde, "examples/inference_data_plots/posterior_plots_kde.png")
  end

I have implemented new functions to draw posterior intervals side by side with rank plots and trace plots.

In the plots below, the labels are automatically generated from the variables contained in the idata struct, but they can be customized with a very flexible and general system. Ulam can also plot posterior parameters with histograms instead of KDEs.

Trace plots

Rank plots

Code

The code below needs additional imports and aliases to work, obviously.

  def posterior_and_trace_plots() do
    idata = InferenceData.from_parquet!("demo/models/inference_data_plots/linear_regression/linear_regression.parquet")

    figure_kde =
      Figure.new([width: Length.cm(10), height: Length.cm(18)], fn _fig ->
        _plots = InferenceData.draw_posterior_and_trace_plots(idata)
      end)
      
    Figure.render_to_png_file!(figure_kde, "examples/inference_data_plots/posterior_plots_kde_with_trace_plots.png")
  end

  def posterior_and_rank_plots() do
    idata = InferenceData.from_parquet!("demo/models/inference_data_plots/linear_regression/linear_regression.parquet")

    figure_kde =
      Figure.new([width: Length.cm(10), height: Length.cm(18)], fn _fig ->
        _plots = InferenceData.draw_posterior_and_rank_plots(idata, plot_type: :kde)
      end)
      
    Figure.render_to_png_file!(figure_kde, "examples/inference_data_plots/posterior_plots_kde_with_rank_plots.png")
  end

Because of improvements in Quartz, Ulam now supports legends in plots when it makes sense (when plots are very simliar and the color/style conventions are the same it tries not to repeat the label if it isn’t necessary):

Energy plots, with labels as used by the rstan visualizations (using KDEs):

Energy plots (using histograms):

Posterior plots with trace plots side by side:

This has been made possible by improvements in the Quartz library. I hope to be able to publish both Quartz and Ulam on hex soon.