Getting ID (first element of response) into a map while manipulating second element

Justinbenfit23 · July 10, 2021, 7:13pm

New to Elixir, I built three functions that give me the number of times one of a list of special words occurs in each car dealership review scraped from the internet. Scraped responses look like this before applying the functions:

[
  {"July 03, 2021",
   "We had been looking for a 2021 Suburban for about 6 months and no one could find exactly what we wanted! We contacted Adrian at McKaig and he told us he could order one for us! Our Suburban was delivered in 4 weeks and had everything on it that we wanted! Adrian, Brandon, Dennis and Freddie all worked with us to get exactly what we wanted! They made phone calls and deals for us right on the spot and we drove out with a beautiful black Suburban! We will definitely use Adrian and McKaig Chevrolet again! Thank you for a fun car buying experience!"},
  {"July 03, 2021",
   "Adrian first educated me on trade ins. And Joe helped me have a car with a better fit and one I can feel good about! "},
  {"July 03, 2021",
   "Adrian was able to finally help my fiance get the truck he needed for quite some time now! We left Mckaig extremely happy and grateful! As always customer service was amazing especially Adrian’s and Joe’s!"},... ]

The number of occurrences only looks at the second element (the review text itself) and is saved in key titled score. I’m trying to keep the date included to act as an ID. but can’t figure out how to include ID in the comprehension without screwing it up.

Here is the code and current result as well as desired result:

def list_conv() do
      input = get_body()
      new_input = Enum.map(input, fn n -> elem(n,1) end)
      
      for n <- new_input do
        String.downcase(n)
        |> String.split(" ", trim: true)
      end
end


def special_words() do
      special_words = ["extremely", "definitely","amazing","very","best","great","excellent","awesome","incredibly","beyond","loved","really","highly"]

     input = list_conv()
     for i <- input do
        {Enum.count(i, &(&1 in special_words)), i}
      end
    
def spec_map() do
      nl = special_words()
      for {score, x} <- nl do
        %{score: score,
        }
      end
end

Result:

[
  %{score: 0},
  %{score: 1},
  %{score: 0}, ...]

Desired Result:

[
  %{score: 0, id: "July 10, 2021"},
  %{score: 1,id: "July 03, 2021"},
  %{score: 0, id: "July 03, 2021"}... ]

Thank you!

thiagomajesk · July 10, 2021, 7:39pm

Hi @Justinbenfit23!

Check if something like this helps you get an idea on how you can tackle this:

reviews = [
  {"July 03, 2021",
   "We had been looking for a 2021 Suburban for about 6 months and no one could find exactly what we wanted! We contacted Adrian at McKaig and he told us he could order one for us! Our Suburban was delivered in 4 weeks and had everything on it that we wanted! Adrian, Brandon, Dennis and Freddie all worked with us to get exactly what we wanted! They made phone calls and deals for us right on the spot and we drove out with a beautiful black Suburban! We will definitely use Adrian and McKaig Chevrolet again! Thank you for a fun car buying experience!"},
  {"July 03, 2021",
   "Adrian first educated me on trade ins. And Joe helped me have a car with a better fit and one I can feel good about! "},
  {"July 03, 2021",
   "Adrian was able to finally help my fiance get the truck he needed for quite some time now! We left Mckaig extremely happy and grateful! As always customer service was amazing especially Adrian’s and Joe’s!"}
]

special_words = ["extremely", "definitely","amazing","very","best","great","excellent","awesome","incredibly","beyond","loved","really","highly"]

reviews
|> Stream.map(fn {k, v} -> {k, String.split(v)} end)
|> Stream.map(fn {k, v} -> {k, Enum.filter(v, &(&1 in special_words))} end)
|> Stream.map(fn {k, v} -> {k, Enum.frequencies(v)} end)
|> Stream.map(fn {k, v} -> Map.put(v, "id", k) end)
|> Enum.to_list()

This will yield something like this:

[
  %{"definitely" => 1, "id" => "July 03, 2021"},
  %{"id" => "July 03, 2021"},
  %{"amazing" => 1, "extremely" => 1, "id" => "July 03, 2021"}
]

The structure is a bit different from what you asked but you can get a rough idea on how you are able to transform your data into the desired result with Elixir.

Cheers!

Justinbenfit23 · July 10, 2021, 10:16pm

@thiagomajesk Thank you so much!
This is very helpful! So being that I’m not very concerned with the count of occurrences of each individual “special word” but rather want the sum of all occurrences of special words for each review I tweaked what you gave me for the result of special word occurrences as well as another function that counted all the words in each review. I couldn’t get it to add the ID after tweaking and I couldn’t get it to return a list of maps so now it’s a list of tuples but the at least the code is much cleaner and shorter now!

def special_words do
      special_words = ["extremely", "definitely","amazing","very","best","great","excellent","awesome","incredibly","beyond","loved","really","highly"]

      get_body()
      |> Stream.map(fn {k, v} -> {k, String.downcase(v)} end)
      |> Stream.map(fn {k, v} -> {k, String.split(v)} end)
      |> Stream.map(fn {k, v} -> {k, Enum.count(v, &(&1 in special_words))} end)
      |> Enum.to_list()
      
    end
    
def count() do
      get_body()
      |> Stream.map(fn {k, v} -> {k, String.downcase(v)} end)
      |> Stream.map(fn {k, v} -> {k, String.split(v)} end)
      |> Stream.map(fn {k, v} -> {k, Enum.count(v)} end)
      |> Enum.to_list() 
    end
def get_scores() do
      list1 = special_words()
      list2 = count()
     
      list1 |> Enum.zip(list2)
end

get_scores() returns this:

[
  {{"July 10, 2021", 7}, {"July 10, 2021", 94}},
  {{"July 10, 2021", 0}, {"July 10, 2021", 42}},... ]

(The end goal is to find the most overly positive reviews by dividing the number of special words by number of total words to create a “score” for each review). I realized that making date the ID won’t work anyway because there are multiple reviews per day. This leaves 1.) the need to create an ID for each review and 2.) to be able to figure out how to access the values inside these nested tuples to divide one by the other. Thank you for again for your help! If you have any other suggestions I’m all ears!

gregvaughn · July 10, 2021, 10:32pm

Consider something like this, given your reviews and special_words in the top post:

iex(6)> compute_score = fn review -> {special, total} = review |> String.downcase |> String.split |> Enum.reduce({0, 0}, fn w, {sp, to} ->
...(6)> if w in special_words, do: {sp + 1, to + 1}, else: {sp, to + 1} end)                                                              
...(6)> special / total                                                                                                                   
...(6)> end                                                                                                                               
#Function<44.79398840/1 in :erl_eval.expr/5>
iex(7)> for {d, r} <- reviews, do: %{id: :erlang.phash2(r), date: d, score: compute_score.(r)}                                            
[
  %{date: "July 03, 2021", id: 59481064, score: 0.009615384615384616},
  %{date: "July 03, 2021", id: 47140814, score: 0.0},
  %{date: "July 03, 2021", id: 76368188, score: 0.05714285714285714}
]

compute_score uses a two-tuple accumulator to Enum.reduce to keep track of the words that match special_words and a total. I also used :erlang.phash2 to get a unique identifier based upon the review text.