billylanchantin

billylanchantin

How should I undo a one-hot encoded variable?

I have a dataset with categorical variables, e.g.:

alias Explorer.DataFrame, as: DF

df = DF.new(%{
  age: [1, 5, 3],
  animal: ["dog", "cat", "dog"],       # categorical
  color: ["brown", "black", "brindle"] # categorical
})

I currently have the variables one-hot encoded:

df_one_hot = DF.new(%{
  age: [1, 5, 3],
  animal_cat_1_of_2: [1, 0, 1], # animal == "dog"
  animal_cat_2_of_2: [0, 1, 0], # animal == "cat"
  color_cat_1_of_3: [1, 0, 0], # color == "brown"
  color_cat_2_of_3: [0, 1, 0], # color == "black"
  color_cat_3_of_3: [0, 0, 1], # color == "brindle"
})

Neural nets generally do well with that encoding. Tree-based models, however, may benefit from the original encoding or ordinal encoding. So while experimenting with different models, I found myself needing to undo the one-hot encoding.

I came up with a solution which I’ll post below. But I wanted to see how others would approach the problem. This is a somewhat computationally-intensive operation, and I worry that I’m not taking full advantage of Explorer. In particular, this looked like it may be a job for across, but I couldn’t make it work.

Marked As Solved

josevalim

josevalim

Creator of Elixir

If you can hardcode the fields, then you can do:

require Explorer.DataFrame, as: DF

DF.mutate(df_one_hot,
  animal: cond do
    animal_cat_1_of_2 == 1 -> "dog"
    animal_cat_2_of_2 == 1 -> "cat"
  end,
  color: cond do
    color_cat_1_of_3 == 1 -> "brown"
    color_cat_2_of_3 == 1 -> "black"
    color_cat_3_of_3 == 1 -> "brindle"
  end
)

If you cannot, then you can port your approach to mutate_with. mutate_with gives you access to the columns and allow you to dynamically build a query based on the field. Then use Series.select to build the cond. Scroll down to find the answer (I added some padding in case you want to try it out by yourself before seeing the solution):


















categorical_cols =
  df_one_hot.names
  |> Enum.map(&Regex.run(~r/(.+)_cat_(\d+)_of_\d+/, &1))
  |> Enum.reject(&is_nil/1)
  |> Enum.group_by(
    fn [_, group, _] -> group end,
    fn [col, _, num] -> {col, String.to_integer(num)} end
  )
  |> Map.new(fn {group, col_num_pairs} -> {group, col_num_pairs} end)

DF.mutate_with(df_one_hot, fn df ->
  Enum.map(categorical_cols, fn {group, col_to_num} ->
    expr = Enum.reduce(col_to_num, -1, fn {col_name, num}, acc ->
      equal = Explorer.Series.equal(df[col_name], 1)
      Explorer.Series.select(equal, num, acc)
    end)

    {group, expr}
  end)
end)

Where Next?

Popular in Questions Top

9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
earth10
Hi, I’m just starting to build a side-project with Elixir and Phoenix and doing some basic test with Elixir alone. What strikes me is th...
New
gshaw
What is the idiomatic way of matching for not nil in Elixir? E.g., First way: defp halt_if_not_signed_in(conn, signed_in_account) when...
New
New
minhajuddin
I have seen a lot of code which picks the first element from a list using Enum.at(0) instead of List.first. Is there a reason why people ...
New
itssasanka
Hi all, Trying to get some more clarity over utc_datetime and naive_datetime for Ecto: https://hexdocs.pm/ecto/Ecto.Schema.html#module-...
New
jerry
Good day to you all. I have been struggling to get a query involving like and ilike to work. Can anyone assist me on this, please? pro...
New
Lily
In templates/appointment/index.html.eex: <%= for appointment <- @appointments do %> <tr> <td><%= appoi...
New
nobody
Hi! In PHP: $SERVER['SERVERADDR'] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New

Other popular topics Top

danschultzer
None of the current solutions worked well for me, so I went ahead and built a user management system from scratch. This project took far...
548 29305 241
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
johnnyicon
Hi all, I've just started learning Elixir and Phoenix Framework, so please pardon my n00bness at this stage. I'm trying to use Postg...
New
alice
Hey, Just curious what are the main benefits of Elixir compared to Clojure? When is Elixir more useful than Clojure and vice versa? Th...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
RisingFromAshes
I've read in another post that it may be possible with a router helper - but I couldn't find an appropriate one, and tbh, I'm still just ...
New
bsollish-terakeet
Credo is smart enough to check for (something like) this: assert length(the_list) == 0 with this response: Checking if an enum is empt...
New
vegabook
I'm brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
dblack
I’ve got an issue with an app and I’ve no idea of how to troubleshoot it. I’m hoping someone here might have seen something similar. I p...
New
AstonJ
We’ve put together this wiki for Phoenix LiveView - please feel free to add any info you feel is worth including. What is Phoenix LiveV...
New

We're in Beta

About us Mission Statement