This feels like a dumb question but I am stuck. Using Explorer I want to count the number of rows for each group (groups may be defined by one column, or multiple columns).
I would think this would be:
alias Explorer.{Datasets, DataFrame}
Datasets.fossil_fuels()
|> DataFrame.group_by("country")
|> DataFrame.summarise(country: [:count])
And the result would be a data frame like:
#Explorer.DataFrame<
Polars[222 x 2]
country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
country_count integer [5, 5, 5, 5, 5, ...]
>
Instead I get this error:
%Inspect.Error{
message: "got RuntimeError with message \"Polars(NotFound(\\\"country_count\\\"))\"
while inspecting %{__struct__: Explorer.DataFrame, data: %{__struct__:
Explorer.PolarsBackend.DataFrame, resource: #Reference<0.3895999949.566099982.184593>},
dtypes: %{\"country\" => :string, \"country_count\" => :string}, groups: [], names:
[\"country\", \"country_count\"]}"
}
It works fine if I group and count by different variables, e.g.,
Datasets.fossil_fuels()
|> DataFrame.group_by("country")
|> DataFrame.summarise(total: [:count])
returns:
#Explorer.DataFrame<
Polars[222 x 2]
country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", ...]
total_count integer [5, 5, 5, 5, 5, ...]
>
but it seems odd that I have to summarise an arbitrary other variable. What am I missing?