I have a couple of questions about Explorer’s DataFrame
and the Access
behaviour.
In the selecting columns section of the Explorer manual, it mentions that the Access behaviour is implemented for DataFrames.
Because of this I would expect that accessing a column that doesn’t exist with the bracket notation to return nil
and using DataFrame.fetch
to try an access a column that doesn’t exist to return :error
.
However, this is not the case. Here are some examples.
Examples
df =
DataFrame.new(
a: [1, 2, 3],
b: [10, 20, 30]
)
And here are some ways to access it.
Using brackets
iex> df["a"]
#Explorer.Series<
Polars[3]
s64 [1, 2, 3]
iex> df["c"]
** (ArgumentError) could not find column name "c". The available columns are: ["a", "b"].
If you are attempting to interpolate a value, use ^c.
Trying to access column c
raises an ArgumentError
, but I would expect that to return nil
given the Access
behaviour.
Using fetch
Using fetch
was also surprising:
iex> Explorer.DataFrame.fetch(df, "a")
{:ok,
#Explorer.Series<
Polars[3]
s64 [1, 2, 3]
>}
iex> Explorer.DataFrame.fetch(df, "c")
** (ArgumentError) could not find column name "c". The available columns are: ["a", "b"].
If you are attempting to interpolate a value, use ^c.
The first makes sense ({:ok, val}
) but the second, raises an ArgumentError
, where I would expect it to return :error
given the Access
behaviour.
I assume that this is the intended way for it to work given that it is included in the test suite. See this test for example, which shows that an ArgumentError
is expected to be raised.
(The fact that it is an ArgumentError
is also interesting given that fetch!/2
from Access raises a KeyError
exception rather than ArgumentError
.)
Other Access behaviour functions
Another interesting thing is that not all of the functions in the Access
behaviour are available. E.g.,
iex> Explorer.DataFrame.fetch!(df, "c")
** (UndefinedFunctionError) function Explorer.DataFrame.fetch!/2 is undefined or private.
Summary
To summarize, here are my questions:
- Why does accessing columns in a
DataFrame
not behave in the way that theAccess
behaviour docs imply that it should behave? - Why aren’t all the
Access
behaviour functions available onDataFrame
? - How are you supposed to check if a column exists in a
DataFrame
other than using atry
block?