Using Series.filter inside DataFrame.summarise

cc @billylanchantin

Are you able to use Explorer.Series.filter inside Explorer.DataFrame.summarise ?

I currently have some code that looks like this, but its giving me a compiler error about (ArgumentError) expected a variable to be given to var!, got: Explorer.DataFrame.pull( var!(df, Explorer.Query), :df )

    data_frame
    |> DataFrame.mutate(
      foo_id:
        if result in ["alice", "bob", "carol", "dave"] do
          bar_id
        else
          nil
        end
    )
    |> DataFrame.group_by(["some_idx"])
    |> DataFrame.summarise(
      baz:
        if count(foo_id) == 0 do
          "other/none"
        else
          first(filter(foo_id, is_not_nil(_)))
        end
    )

cc perhaps @josevalim and @cigrainger

I change the summarise call to only use function versions, and now i get an invalid_struct erlang error

    |> DataFrame.summarise_with(fn ldf ->
      [
        hit:
          if count(ldf[:foo_id]) == 0 do
            "none"
          else
            first(filter_with(ldf[:foo_id], &is_not_nil(&1)))
          end
      ]
    end)

Okay, ended up figuring it out using a different approach.

    some_idx = data_frame |> DataFrame.distinct([:some_idx])

    data_frame
    |> DataFrame.mutate(
      foo_id:
        if result in ["alice", "bob", "carol", "dave"] do
          foo_id
        else
          nil
        end
    )
    |> DataFrame.drop_nil([:foo_id])
    |> DataFrame.group_by(["some_idx"])
    |> DataFrame.summarise(foo: first(foo_id))
    |> DataFrame.join(some_idx, on: [:some_idx], how: :right)
    |> DataFrame.mutate(foo: fill_missing(foo, "none"))

Hi @mhanberg!

I think this is actually a bug. Unless there’s a limitation I’m missing, your original query should work. I’ve opened an issue here:

Awesome job working around the problem!