Can't filter rows where column A value is nil

Using Explorer, I have a df where rows have id, size columns. the df has multiple entries for 1 id, but sometimes has a nil in the size column.

I’m trying to grab all the ids that have at least 1 nil value in the size column, to later exclude them from my df.

I try this:

ids_to_exclude = 
  df
  |> DataFrame.filter(size == nil)

But get the following error:

** (ArgumentError) cannot invoke Explorer.Series.equal/2 with mismatched dtypes: :string and nil
    (explorer 0.8.1) lib/explorer/series.ex:6274: Explorer.Series.dtype_mismatch_error/3
    #cell:jtyrwno7aej2bj3n:3: (file)
    #cell:jtyrwno7aej2bj3n:3: (file)

I can’t figure out what’s the correct/expected approach to handling this without doing this in plain elixir.

Any help would be much appreciated!

Thanks :slight_smile:

You can do something like:

ids_to_exclude = 
  df
  |> DataFrame.filter(is_nil(size))
  |> then(fn df -> df["id"] end)

Inside an Explorer Query, you can use any operation from Explorer.Series, like the is_nil/1 one.

Hey Hugo, thanks for getting back to me. I ended up using is_nil like this:

order_ids_to_exclude = 
  df
  |> DataFrame.filter(is_nil(product_size_category))
  |> DataFrame.select("Order ID")
  |> DataFrame.to_series()
  |> (fn %{"Order ID" => series} -> series |> Explorer.Series.cast(:integer) end).()

followed by:

df =
  df
  |> DataFrame.filter_with(fn df -> 
    Series.in(df["Order ID"], order_ids_to_exclude) 
    |> Series.not()
  end)