Undefined variable in across Explorer.DataFrame

Hello, I started the ml elixir book, the first code in this book has error for me :smiling_face_with_tear:

require Explorer.DataFrame, as: DF
iris = Explorer.Datasets.iris()
cols = ["sepal_width", "sepal_length", "petal_length", "petal_width"]

DF.mutate(iris,
  for col <- across(cols) do
    {col.name, (col - mean(col)) / variance(col)}
  end
)

Errors

** (ArgumentError) undefined variable "cols"
    (explorer 0.6.1) lib/explorer/query.ex:365: Explorer.Query.traverse/3
    (elixir 1.15.2) lib/enum.ex:1819: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (explorer 0.6.1) lib/explorer/query.ex:375: Explorer.Query.traverse/3
    (explorer 0.6.1) lib/explorer/query.ex:401: Explorer.Query.traverse_for/3
    (explorer 0.6.1) lib/explorer/query.ex:298: anonymous fn/3 in Explorer.Query.traverse/2
    (elixir 1.15.2) lib/enum.ex:1819: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (explorer 0.6.1) lib/explorer/query.ex:296: Explorer.Query.traverse/2
    (explorer 0.6.1) expanding macro: Explorer.Query.query/1

I am using LiveBook v0.10.0 and these deps:

Mix.install([
  {:axon, "~> 0.5"},
  {:nx, "~> 0.5"},
  {:explorer, "~> 0.5"},
  {:kino, "~> 0.5"}
])

It should be noted if I use the list directly it has not problem

require Explorer.DataFrame, as: DF
iris = Explorer.Datasets.iris()

DF.mutate(iris,
  for col <- across( ["sepal_width", "sepal_length", "petal_length", "petal_width"]) do
    {col.name, (col - mean(col)) / variance(col)}
  end
)

why it has this error?
Thank you in advance

Hey @shahryarjb :wave:

I didn’t try and I don’t own the book (yet), but based on the documentation if you want to access to all the columns in a for comprehension loop you just need to use across().

See this section Explorer.Query — Explorer v0.6.1

A for-comprehension can have multiple generators and filters. For instance, if you want to apply standardization to all float columns, we can use across/0 to access all columns and then use a filter to keep only the float ones:

Hope it helps, enjoy the reading :slight_smile:

These are not all the columns
Screenshot 2023-08-01 at 23.13.55

but based on my top code, it I use

for col <- across( ["sepal_width", "sepal_length", "petal_length", "petal_width"]) do

it works but the

cols = ["sepal_width", "sepal_length", "petal_length", "petal_width"]

DF.mutate(iris,
  for col <- across(cols) do

does not work.

If I use

DF.mutate(iris,
  for col <- across() do

it shows me

** (ArgumentError) Explorer.Series.mean/1 not implemented for dtype :string. Valid dtypes are [:integer, :float]
    (explorer 0.6.1) lib/explorer/series.ex:4692: Explorer.Series.dtype_error/3
    /Users/shahryar/Desktop/ml.livemd#cell:anjlmkx4hegczaf3ubcadze2wqyuweq5:7: (file)
    /Users/shahryar/Desktop/ml.livemd#cell:anjlmkx4hegczaf3ubcadze2wqyuweq5:6: (file)
    /Users/shahryar/Desktop/ml.livemd#cell:anjlmkx4hegczaf3ubcadze2wqyuweq5:5: (file)

Ah I see, sorry I missed that.

I guess you can apply a filter based on the column type:

DF.mutate(iris,
  for col <- across(), col.dtype == :float do
    {col.name, (col - mean(col)) / variance(col)}
  end
)

:thinking:

Honestly, I don’t know why the book suggests an usage of the across macro that is not contemplated by the documentation, or maybe I’m missing something :upside_down_face:

It seems like Explorer.Query adopted a similar approach to Ecto.Query in its use of ^ for interpolation.

reference: Interpolation and Casting | Ecto.Query & Interpolation | Explorer.Query

p.s. It’s worth noting that the ^ in Ecto and Explorer is not the same as the ^ pin operator in Elixir – learn more about it here: elixir - Why do Ecto queries need the pin (^) operator? - Stack Overflow

6 Likes

I think the author should change it in the book, @seanmor5

Yeah, this is tracked here as well: Machine Learning in Elixir: Error in Chapter 1, first code snippet of "Preparing the Data for Training" (page 13) - PragProg Customers - Devtalk

It will be fixed when the next version is released

3 Likes