Missing functions in Explorer DataFrames/Series

Compared to Pythons pandas I am missing some functions for DataFrames and Series from Explorer, or I have to apply a workaround, which makes everything less expressive.

DataFrames have a drop_nil function. Why don’t have Series have it? Well, I can convert it to list, drop them from the list and then convert back…

By the documentation the series have a contains function, that use a regex, could not get with working with a compiled regex string. I did a mapping instead:

filter = df["content"]
|> S.to_list()
|> Enum.map(&Regex.match?(tags,&1))
|> S.from_list()

This returns a series of boolean. But I can not use it filter a DataFrame or a Series? At least I did not find it in the reference and also no way around. It neighter a filter, filter_with nor a slice. Slice needs indices. How do I get the indices, where the value is :true?

The last question is, about Date(time) columns. How can I convert a Date to a calendar week?

Update:

post_filter = df["content"]
|> S.downcase()
|> S.to_list()
|> Enum.map(&Regex.match?(tags,&1))
|> Enum.with_index()
|> Enum.filter(&(elem(&1, 0) == true))
|> Enum.map(&(elem(&1, 1)))

Can be used as input for the slicing…

1 Like

I believe you want Explorer.Series.mask/2.

content = S.from_list(["aa", "ab", "bc"])
# #Explorer.Series<
#   Polars[3]
#   string ["aa", "ab", "bc"]
# >
mask = S.transform(content, &Regex.match?(~r/a.+/, &1))
# #Explorer.Series<
#   Polars[3]
#   boolean [true, true, false]
# >
S.mask(content, mask)
# #Explorer.Series<
#   Polars[2]
#   string ["aa", "ab"]
# >
1 Like

Sorry I missed this part. Also I’m not sure what a “calendar week” is. Is that the index of the week in the calendar year? If so this could work:

S.from_list([~D[2023-01-01], ~D[2023-01-08], ~D[2023-02-23]])
# #Explorer.Series<
#   Polars[3]
#   date [2023-01-01, 2023-01-08, 2023-02-23]
# >
|> S.transform(&Date.day_of_year/1)
# #Explorer.Series<
#   Polars[3]
#   integer [1, 8, 54]
# >
|> S.quotient(7)
# #Explorer.Series<
#   Polars[3]
#   integer [0, 1, 7]
# >
1 Like

We will also gladly accept PRs that add missing functions. We know that we are missing many of them. :slight_smile:

3 Likes

Thanks. @billylanchantin the mask function helped alot. Sorry, that I was not able find it.

The calendar week is somewhat more complex, because it first week of the year, might partly be counted as the last week of the year before. I had to look it up. Normally I would expect it to be a part of the date(time) library

wikipedia states the ISO week date computes as:

Algorithm

  1. Subtract the weekday number from the ordinal day of the year.
  2. Add 10.
  3. Divide by 7, discard the remainder.
  • If the week number thus obtained equals 0, it means that the given date belongs to the preceding (week-based) year.
  • If a week number of 53 is obtained, one must check that the date is not actually in week 1 of the following year.

The edge cases below, Is what I meant and makes me wonder, whether I really understand these.

You’re welcome. And no worries! There are a lot of functions to sift through.

Ah gotcha. Yeah that looks tricky to implement yourself.

As @josevalim said, there are missing functions. If you look at the Elixir side of things:

https://hexdocs.pm/explorer/Explorer.Series.html#functions-datetime-ops

They do not have a function that corresponds to what you want. But you’re in luck because Polars does!

Unless I’m missing something, it should be straightforward to add that function to Explorer if you wanted to do a PR :slight_smile:

And if you can’t submit a PR, open up an issue and we will look into it :slight_smile:

1 Like

Sorry for answering late. The messages from the elixir-forum always end up in my spam filter. About the PR… a) a I very new to Elixir not so bold to mess up some core libraries :wink: . Also barely have to time.

But creating an issue is doable. :wink:

2 Likes

Hi @sehHeiden,

The function is now on main (José reviews PRs quick!):

Add `day_of_year` and `week_of_year` by billylanchantin · Pull Request #717 · elixir-explorer/explorer · GitHub

So if you get the latest ref, you can now do:

[~D[2023-01-01], ~D[2023-01-08], ~D[2023-02-23]]
|> S.from_list()
|> S.week_of_year()
# #Explorer.Series<
#   Polars[3]
#   integer [52, 1, 8]
# >

Since you’re new to Elixir, you can get the latest ref by adding the following to your project’s mix.exs:

  # list of deps
  [
    # ...
    {:explorer,
     git: "https://github.com/elixir-explorer/explorer.git",
     # This is the current ref of main.
     # It may change, but as long as the ref is after this one,
     # you'll have access to the function.
     ref: "aef274989ab490b0a392ccd19ec24b286a8cda1c",
     override: true},
    # ...
  ]

There are other ways of doing this too, but that should get you started.

4 Likes

Thanks for your message @billylanchantin . I did have a look at the commits. Have been many changes for appyling a single function.

1 Like

Hi @billylanchantin I tried, to do so: I get a error message from rustler:

==> explorer
Compiling 24 files (.ex)

== Compilation error in file lib/explorer/polars_backend/native.ex ==
** (ErlangError) Erlang error: :enoent
    (elixir 1.14.5) lib/system.ex:1060: System.cmd("cargo", ["metadata", "--format-version=1"], [cd: "native/explorer"])
    (rustler 0.29.1) lib/rustler/compiler/config.ex:83: Rustler.Compiler.Config.metadata!/1
    (rustler 0.29.1) lib/rustler/compiler/config.ex:70: Rustler.Compiler.Config.build/1
    (rustler 0.29.1) lib/rustler/compiler.ex:9: Rustler.Compiler.compile_crate/2
    lib/explorer/polars_backend/native.ex:11: (module)

I have no problem with building the hex version

1 Like

You will need Rust installed to use the git version. Check out the steps in the README. :slight_smile:

1 Like

Thanks for your reply. @josevalim Works with installing rustup. The asdf plugin for rust I found has no nightly rust?!
Compiling takes a while. During compilation the explorer version was denoted as 0.1. How do I check that?

The week of the year function was working.

1 Like

The explorer internal library in Rust is always 0.1 but that’s not a concern. :slight_smile: It is always internal and never published.

1 Like