suchasurge

suchasurge

Timeseries upsampling with linear interpolation in explorer

Hi there.

The last week I had to work on some timeseries data. Specifically upsampling with linear interpolation of timeseries data.

This is quite easy in python with pandas:

import pandas as pd

# Your data
data = [
    ["2023-02-13", 100],
    ["2023-02-15", 100.01],
    ["2023-02-16", 100.09],
    ["2023-02-17", 101.02],
    ["2023-02-20", 105.00],
    ["2023-02-22", 103.06]
]

# Convert to DataFrame
df = pd.DataFrame(data, columns=['Date', 'Value'])

# Convert 'Date' column to datetime type and set as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Resample to daily frequency and interpolate missing values
upsampled = df.resample('D').interpolate(method='linear')

print(upsampled)

But the project where I need to do it is, obviously :slight_smile:, written in elixir.
So the explorer library came into my mind.

I haven’t used explorer before and had a hard time playing with the timeseries data in a livebook.

In the end I build the upsampling with linear interpolation in plain elixir.

Now I’m wondering if any of you think that it’s currently possible to translate the above python/pandas code to elixir/explorer?

With the usage of explorer I hope to get some performance benefits because as of now it looks like that my plain elixir solution is slow and consumes more CPU than we saw for the app before.
This is especially true for quite large timeseries (about 30 years) with long gaps (weeks to month for the large timeseries).

Can I even get performance improvements writing this in explorer? Does anyone of you have any experience?

When trying to translate to explorer I’m stuck at the upsampling part:

alias Explorer.DataFrame, as: DF
require Explorer.DataFrame

data = [
    ["2023-02-13", 100],
    ["2023-02-15", 100.01],
    ["2023-02-16", 100.09],
    ["2023-02-17", 101.02],
    ["2023-02-20", 105.00],
    ["2023-02-22", 103.06]
]

df = data
|> Enum.map(fn [date, value] ->
  %{
    date: date, 
    value: value
  }
end)
|> DF.new()
|> DF.put("date", Explorer.Series.strptime(df["date"], "%Y-%m-%d"))

So proper dates are there but how do I loop through the data now to be able to fill the missing rows (upsampling)?

Cheers
Frank

Most Liked

billylanchantin

billylanchantin

I don’t believe Explorer supports interpolation at this time. Until it does, I think you’ll need to use a workaround.

Here is an example workaround:

alias Explorer.DataFrame, as: DF
alias Explorer.Series, as: S
require DF

defmodule Interpolate do
  def evenly_spaced(x_step, df, x_col, y_col) do
    n_rows = DF.n_rows(df)
    last = DF.slice(df, n_rows - 1, 1)[[x_col, y_col]]

    DF.new(
      x1: S.head(df[x_col], n_rows - 1),
      x2: S.tail(df[x_col], n_rows - 1),
      y1: S.head(df[y_col], n_rows - 1),
      y2: S.tail(df[y_col], n_rows - 1),
    )
    |> DF.to_rows_stream()
    |> Stream.map(fn %{"x1" => x1, "y1" => y1, "x2" => x2, "y2" => y2} ->
      m = (y2 - y1) / (x2 - x1)
      x = x1..x2//x_step |> Enum.to_list() |> S.from_list()
      n = S.size(x)
      x = S.head(x, n - 1)
      y = x |> S.subtract(x1) |> S.multiply(m) |> S.add(y1)

      DF.new(%{x_col => x, y_col => y})
    end)
    |> Enum.reduce(&DF.concat_rows(&2, &1))
    |> DF.concat_rows(last)
  end
end

# Build data
data = [
  ["2023-02-13", 100],
  ["2023-02-15", 100.01],
  ["2023-02-16", 100.09],
  ["2023-02-17", 101.02],
  ["2023-02-20", 105.00],
  ["2023-02-22", 103.06]
]

df = (
  data
  |> Enum.map(fn [date, value] ->
    %{date: Date.from_iso8601!(date), value: value}
  end)
  |> DF.new()
)

# Cast date column to integer.
# (Units make working with Date/Durations a bit awkward.)
df = DF.mutate(df, date_int: cast(date, :integer))

# Run interpolation.
interp = Interpolate.evenly_spaced(1, df, "date_int", "value")

# Cast back to date.
interp = DF.mutate(interp, date: cast(date_int, :date))

# Print result.
interp[["date", "value"]] |> DF.print(limit: :infinity)

# +---------------------------------------------+
# | Explorer DataFrame: [rows: 10, columns: 2]  |
# +------------------+--------------------------+
# |       date       |          value           |
# |      <date>      |         <float>          |
# +==================+==========================+
# | 2023-02-13       | 100.0                    |
# +------------------+--------------------------+
# | 2023-02-14       | 100.005                  |
# +------------------+--------------------------+
# | 2023-02-15       | 100.01                   |
# +------------------+--------------------------+
# | 2023-02-16       | 100.09                   |
# +------------------+--------------------------+
# | 2023-02-17       | 101.02                   |
# +------------------+--------------------------+
# | 2023-02-18       | 102.34666666666666       |
# +------------------+--------------------------+
# | 2023-02-19       | 103.67333333333333       |
# +------------------+--------------------------+
# | 2023-02-20       | 105.0                    |
# +------------------+--------------------------+
# | 2023-02-21       | 104.03                   |
# +------------------+--------------------------+
# | 2023-02-22       | 103.06                   |
# +------------------+--------------------------+

This workaround will be quite slow. If you need a performant solution, you’ll need to shell out to Polars proper.

Also, note that because of the simplicity of your use case (evenly spaced dates), I was able to use a simple algorithm. In the general case is more complicated.

suchasurge

suchasurge

Thanks for your explorer based solution.

I had the hope that there is a solution which is more performant than with plain elixir.

Anyways. I think I follow your suggestion and try to directly use polars.

Hopefully with this I collect enough knowledge to contribute to explorer some day.

Thx again :slight_smile:

Cheers
Frank

billylanchantin

billylanchantin

Sure thing :slight_smile:

Some other thoughts while it’s on my mind.

The interpolation operation is not very array-programming friendly. Check out what these two major implementations are doing:

Both are quite loop heavy. The reason is that the two arrays in question – the original array and the array of points to sample at – are different sizes. Fundamentally, some work needs to be done to figure out which values of the original array are relevant to each sampling point.

Now this fact is essentially irrelevant to Explorer since Explorer is using Polars under the hood (at least the primary backend is). Adding interpolation to Explorer is more about API design than it is about algorithms. I mostly note this fundamental limitation of interpolation to set expectations about what’s possible from an Elixir-based solution.

Where Next?

Popular in Questions Top

sergio
In Ruby, I can go: User.find_by(email: "foobar@email.com").update(email: "hello@email.com") How can I do something similar in Elixir? ...
New
qwerescape
Is there a way to get the call stack or stack trace at any point in the code? Not from exceptions, but an expression that returns how the...
New
New
Kurisu
For example for a current url like http://localhost:4000/cosmetic/products?_utf8=✓&amp;query=perfume&amp;page=2, I would like to get: ...
New
aalberti333
As the title describes, I’m trying to run Enum.map() over a list of key/value pairs, where the value is a map. My data looks like this: ...
New
Lily
In templates/appointment/index.html.eex: &lt;%= for appointment &lt;- @appointments do %&gt; &lt;tr&gt; &lt;td&gt;&lt;%= appoi...
New
jason.o
In the code below, if the create action is not set to accept “extra_key” as an input, it errors out with a message shown above. Is there ...
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
shijith.k
I am trying to start a new phoenix project with elixir 1.9, but mix phx.new does not work. It says that ** (Mix) The task "phx.new" could...
New
dotdotdotPaul
Okay, I'm having a heck of a time trying to figure out how to best handle the validation of belongs_to associations in Ecto. I'm sure I'...
New

Other popular topics Top

Darmani72
If I have a post route which an argument: post /my_post_route/:my_param1, MyController.my_post_handler How would get the post params ...
New
mcarvalho
What is the difference between System.get_env and Application.get_env? For example, what are best practices to use one versus another.
New
skosch
To my knowledge, put_in, Map.update etc. all have the one limitation of not automatically creating intermediate keys when needed (for exa...
New
Fl4m3Ph03n1x
About me? ( if you have nothing better to do than reading about some random guy in the internet :stuck_out_tongue: ) Hello all, this is ...
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
fayddelight
I tried installing elixir 1.11.2 erlang 23.3.4 via asdf in my zsh shell. Enabled the versions locally and globally. When I list them ...
New
AngeloChecked
What learn first? Rust or Elixir Hi Elixir community! I’m here because i want learn a new language. I’m a junior developer and mainly i ...
New
bsollish-terakeet
Credo is smart enough to check for (something like) this: assert length(the_list) == 0 with this response: Checking if an enum is empt...
New
nsuchy
Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...
New
romenigld
I am trying to run a deploy with docker and I successfully runned with this command: docker build -t romenigld/blog-prod . but when I t...
New

We're in Beta

About us Mission Statement