Hi there.
The last week I had to work on some timeseries data. Specifically upsampling with linear interpolation of timeseries data.
This is quite easy in python with pandas:
import pandas as pd
# Your data
data = [
["2023-02-13", 100],
["2023-02-15", 100.01],
["2023-02-16", 100.09],
["2023-02-17", 101.02],
["2023-02-20", 105.00],
["2023-02-22", 103.06]
]
# Convert to DataFrame
df = pd.DataFrame(data, columns=['Date', 'Value'])
# Convert 'Date' column to datetime type and set as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Resample to daily frequency and interpolate missing values
upsampled = df.resample('D').interpolate(method='linear')
print(upsampled)
But the project where I need to do it is, obviously , written in elixir.
So the explorer library came into my mind.
I havenβt used explorer before and had a hard time playing with the timeseries data in a livebook.
In the end I build the upsampling with linear interpolation in plain elixir.
Now Iβm wondering if any of you think that itβs currently possible to translate the above python/pandas code to elixir/explorer?
With the usage of explorer I hope to get some performance benefits because as of now it looks like that my plain elixir solution is slow and consumes more CPU than we saw for the app before.
This is especially true for quite large timeseries (about 30 years) with long gaps (weeks to month for the large timeseries).
Can I even get performance improvements writing this in explorer? Does anyone of you have any experience?
When trying to translate to explorer Iβm stuck at the upsampling part:
alias Explorer.DataFrame, as: DF
require Explorer.DataFrame
data = [
["2023-02-13", 100],
["2023-02-15", 100.01],
["2023-02-16", 100.09],
["2023-02-17", 101.02],
["2023-02-20", 105.00],
["2023-02-22", 103.06]
]
df = data
|> Enum.map(fn [date, value] ->
%{
date: date,
value: value
}
end)
|> DF.new()
|> DF.put("date", Explorer.Series.strptime(df["date"], "%Y-%m-%d"))
So proper dates are there but how do I loop through the data now to be able to fill the missing rows (upsampling)?
Cheers
Frank