Hi,
I’m having trouble figuring out how to take a dataframe, split one of the text cells based on a function, and create new rows with the results, with all other column data duplicated for each row.
I just can’t figure out how to convert the list of text fragments into something that can be expanded into additional rows.
My (naïve) stab at it looks as follows for now:
require Explorer.DataFrame, as: DF
require Explorer.Series, as: S
defmodule MyDF do
def apply(df, column, new_column, func) do
series = DF.pull(df, column)
list = S.to_list(series)
new_series =
list
|> Enum.map(&func.(&1))
|> S.from_list()
DF.put(df, new_column, new_series)
end
end
df = DF.new(
class: [1, 2, 1, 3],
text: ["AAA, BBB", "CCC, DDD", "EEE", "FFF, GGG, HHH"]
)
df
|> MyDF.apply("text", "new_text", &String.split(&1, ","))
Resulting in:
explorer.DataFrame<
Polars[4 x 3]
class integer [1, 2, 1, 3]
text string [“AAA, BBB”, “CCC, DDD”, “EEE”, “FFF, GGG, HHH”]
new_text list[string] [
[“AAA”, " BBB"],
[“CCC”, " DDD"],
[“EEE”],
[“FFF”, …]
]
But I would like that list of new_text to be spread accross multiple rows.
I’m coming from R+ and tidy/dplyr, where I would do something simple like:
df %>%
dplyr::rowwise() %>%
mutate(new_text = parse_text(text))
Thank you