suchasurge
Timeseries upsampling with linear interpolation in explorer
Hi there.
The last week I had to work on some timeseries data. Specifically upsampling with linear interpolation of timeseries data.
This is quite easy in python with pandas:
import pandas as pd
# Your data
data = [
["2023-02-13", 100],
["2023-02-15", 100.01],
["2023-02-16", 100.09],
["2023-02-17", 101.02],
["2023-02-20", 105.00],
["2023-02-22", 103.06]
]
# Convert to DataFrame
df = pd.DataFrame(data, columns=['Date', 'Value'])
# Convert 'Date' column to datetime type and set as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Resample to daily frequency and interpolate missing values
upsampled = df.resample('D').interpolate(method='linear')
print(upsampled)
But the project where I need to do it is, obviously
, written in elixir.
So the explorer library came into my mind.
I haven’t used explorer before and had a hard time playing with the timeseries data in a livebook.
In the end I build the upsampling with linear interpolation in plain elixir.
Now I’m wondering if any of you think that it’s currently possible to translate the above python/pandas code to elixir/explorer?
With the usage of explorer I hope to get some performance benefits because as of now it looks like that my plain elixir solution is slow and consumes more CPU than we saw for the app before.
This is especially true for quite large timeseries (about 30 years) with long gaps (weeks to month for the large timeseries).
Can I even get performance improvements writing this in explorer? Does anyone of you have any experience?
When trying to translate to explorer I’m stuck at the upsampling part:
alias Explorer.DataFrame, as: DF
require Explorer.DataFrame
data = [
["2023-02-13", 100],
["2023-02-15", 100.01],
["2023-02-16", 100.09],
["2023-02-17", 101.02],
["2023-02-20", 105.00],
["2023-02-22", 103.06]
]
df = data
|> Enum.map(fn [date, value] ->
%{
date: date,
value: value
}
end)
|> DF.new()
|> DF.put("date", Explorer.Series.strptime(df["date"], "%Y-%m-%d"))
So proper dates are there but how do I loop through the data now to be able to fill the missing rows (upsampling)?
Cheers
Frank
Most Liked
billylanchantin
I don’t believe Explorer supports interpolation at this time. Until it does, I think you’ll need to use a workaround.
Here is an example workaround:
alias Explorer.DataFrame, as: DF
alias Explorer.Series, as: S
require DF
defmodule Interpolate do
def evenly_spaced(x_step, df, x_col, y_col) do
n_rows = DF.n_rows(df)
last = DF.slice(df, n_rows - 1, 1)[[x_col, y_col]]
DF.new(
x1: S.head(df[x_col], n_rows - 1),
x2: S.tail(df[x_col], n_rows - 1),
y1: S.head(df[y_col], n_rows - 1),
y2: S.tail(df[y_col], n_rows - 1),
)
|> DF.to_rows_stream()
|> Stream.map(fn %{"x1" => x1, "y1" => y1, "x2" => x2, "y2" => y2} ->
m = (y2 - y1) / (x2 - x1)
x = x1..x2//x_step |> Enum.to_list() |> S.from_list()
n = S.size(x)
x = S.head(x, n - 1)
y = x |> S.subtract(x1) |> S.multiply(m) |> S.add(y1)
DF.new(%{x_col => x, y_col => y})
end)
|> Enum.reduce(&DF.concat_rows(&2, &1))
|> DF.concat_rows(last)
end
end
# Build data
data = [
["2023-02-13", 100],
["2023-02-15", 100.01],
["2023-02-16", 100.09],
["2023-02-17", 101.02],
["2023-02-20", 105.00],
["2023-02-22", 103.06]
]
df = (
data
|> Enum.map(fn [date, value] ->
%{date: Date.from_iso8601!(date), value: value}
end)
|> DF.new()
)
# Cast date column to integer.
# (Units make working with Date/Durations a bit awkward.)
df = DF.mutate(df, date_int: cast(date, :integer))
# Run interpolation.
interp = Interpolate.evenly_spaced(1, df, "date_int", "value")
# Cast back to date.
interp = DF.mutate(interp, date: cast(date_int, :date))
# Print result.
interp[["date", "value"]] |> DF.print(limit: :infinity)
# +---------------------------------------------+
# | Explorer DataFrame: [rows: 10, columns: 2] |
# +------------------+--------------------------+
# | date | value |
# | <date> | <float> |
# +==================+==========================+
# | 2023-02-13 | 100.0 |
# +------------------+--------------------------+
# | 2023-02-14 | 100.005 |
# +------------------+--------------------------+
# | 2023-02-15 | 100.01 |
# +------------------+--------------------------+
# | 2023-02-16 | 100.09 |
# +------------------+--------------------------+
# | 2023-02-17 | 101.02 |
# +------------------+--------------------------+
# | 2023-02-18 | 102.34666666666666 |
# +------------------+--------------------------+
# | 2023-02-19 | 103.67333333333333 |
# +------------------+--------------------------+
# | 2023-02-20 | 105.0 |
# +------------------+--------------------------+
# | 2023-02-21 | 104.03 |
# +------------------+--------------------------+
# | 2023-02-22 | 103.06 |
# +------------------+--------------------------+
This workaround will be quite slow. If you need a performant solution, you’ll need to shell out to Polars proper.
Also, note that because of the simplicity of your use case (evenly spaced dates), I was able to use a simple algorithm. In the general case is more complicated.
suchasurge
Thanks for your explorer based solution.
I had the hope that there is a solution which is more performant than with plain elixir.
Anyways. I think I follow your suggestion and try to directly use polars.
Hopefully with this I collect enough knowledge to contribute to explorer some day.
Thx again ![]()
Cheers
Frank
billylanchantin
Sure thing ![]()
Some other thoughts while it’s on my mind.
The interpolation operation is not very array-programming friendly. Check out what these two major implementations are doing:
- Polars
- Numpy
Both are quite loop heavy. The reason is that the two arrays in question – the original array and the array of points to sample at – are different sizes. Fundamentally, some work needs to be done to figure out which values of the original array are relevant to each sampling point.
Now this fact is essentially irrelevant to Explorer since Explorer is using Polars under the hood (at least the primary backend is). Adding interpolation to Explorer is more about API design than it is about algorithms. I mostly note this fundamental limitation of interpolation to set expectations about what’s possible from an Elixir-based solution.







