Partitioning data frames - is there a way to efficiently split a dataframe into multiple?

Elixir friends — I have a dataframe where I need to send different partitions based on a grouped column to different systems. Is there a way to efficiently split a dataframe into multiple?

Polars has partition_by and R has group_split but neither appear to be available in Explorer. Do I have other options?

DataFrame.from_query!(conn, “select * from events;”)
|> DataFrame.group_by(:event_type)
|> ...split into multiple frames
|> Enum.each(&dispatch_to_subsystem/1)

We do appear to be missing this functionality!

Here’s an example workaround:

require Explorer.DataFrame, as: DF
require Explorer.Series, as: S

df = Explorer.Datasets.iris()
column = “species”

for value <- df[column] |> S.distinct |> S.to_list, into: %{} do
  {value, df |> DF.filter(col(^column) == ^value) |> DF.discard(column)}
end
1 Like

Thank you. Works as requested :slight_smile:

1 Like