I’d like to query an Explorer DataFrame containing a “select” column such that the output gives, for each row, the field/column that was pointed to by the “select” column. Example:
iex>DF.print(my_df)
+--------------------------------------------+
| Explorer DataFrame: [rows: 3, columns: 3] |
+--------------+--------------+--------------+
| col1 | col2 | select |
| <string> | <string> | <string> |
+==============+==============+==============+
| a | b | col1 |
+--------------+--------------+--------------+
| c | d | col2 |
+--------------+--------------+--------------+
| e | f | col2 |
+--------------+--------------+--------------+
The output I want is:
#Explorer.Series<
Polars[3]
string ["a", "d", "f"]
>
This pops up while processing categorical data and is related to a use-case involving one-hot encoding that was discussed here.
I was hoping to be able to use syntax that is fairly short, something like
my_df[.., my_df["select"]]
…but I haven’t been able to get that working yet. What I have gotten working is:
defmodule Janky do
def dynamic_select(dataframe) do
# use mutate_with combined with Series.select
DF.mutate_with(dataframe, fn df ->
[selection:
Enum.reduce(
Enum.reject(dataframe.names, fn x -> x == "select" end),
"nil",
fn x, acc ->
this_column = S.equal(df["select"], x)
S.select(this_column, df[x], acc)
end
)
]
end)
|> DF.pull("selection")
end
end
iex> Janky.dynamic_select(my_df)
#Explorer.Series<
Polars[3]
string ["a", "d", "f"]
>
This seems like a lot of code to do something fairly straightforward—is there anything more sleek built into Explorer?
In Pandas
, there used to be DataFrame.lookup()
, which I guess has been replaced by something slightly more verbose. I feel like this type of dynamic selection is supported by dplyr
too…
Anything built into Explorer (or on the roadmap) for this?