How to find unique values by two parameters?

I am having a List of Tuples which contains some data and the last element of each tuple is a Map. Now, i want to remove duplications and find the unique values by maximum value %{u: [{_, _, value}]} in the map.

[
  {"Elixir", 2019,
   %{
     values: %{u: [{:b, :r1,  1}]},
     status: true
   }},
  {"Elixir", 2020,
   %{
     values: %{
       u: [
         {:b, :r2, 1},
         {:b, :r3, 2}
       ]
     },
     status: true
   }},
  {"Elixir", 2020,
   %{
     values: %{
       u: [
         {:b, :r2, 2},
         {:b, :r3, 2}
       ]
     },
     status: true
   }}
]

The final output should look like

[
  {"Elixir", 2019,
   %{
     values: %{u: [{:b, :r1,  1}]},
     status: true
   }},
  {"Elixir", 2020,
   %{
     values: %{
       u: [
         {:b, :r2, 2},
         {:b, :r3, 2}
       ]
     },
     status: true
   }}
]

So it also removed the duplicated {elixir, 2020} and left behind the tuple having the maximum value of 2.

I tried it using groups but I don’t how to go inside the maps and then compare two different tuples.

Thanks

Wouldn’t:

Enum.uniq_by(list, fn {name, year, %{values: %{u: us}}} ->
  {name, year, Enum.max_by(us, &elem(&1, 2))}
end)

Do the job?

Unfortunately, It’s still returning the same list. Also, you hardcoded the name and year, it can be more then or even less than the specified arguments.

Thanks

Name and year are not hard-coded in @hauleth’s solution. Those are variables assigned via pattern-matching, a very common syntax in Elixir.

The reason it’s not producing your expected output, is that both of the examples given have the same maximum value of 2. There is nothing in your requirements which would expect the [2, 2] list to be preferred over the [1, 2] list.

Also, you could consider “cleaning up” the data by converting it into a more easily accessible data structure, before attempting the comparison.

1 Like

What do you consider a duplicate here? The way I am seeing both {elixir, 2020} groups of data, even the u values aren’t the exact same lists of other values.

I did a quick prototype of how one might implement this manually:

defmodule Test do
  def unique(todo, seen \\ %{})

  def unique([], seen), do: Map.values(seen)

  def unique([h | t], seen) do
    {language, year, map} = h
    prev_best = Map.get(seen, {language, year})

    if is_nil(prev_best) or check_max_value(map) >= check_max_value(elem(prev_best, 2)) do
      new_seen = Map.put(seen, {language, year}, h)
      unique(t, new_seen)
    else
      unique(t, seen)
    end
  end

  def check_max_value(map) do
    {_, _, value} = Enum.max_by(map.values.u, &elem(&1, 2))
    value
  end
end

Like I mentioned above, checking by maximum value alone might not always give the expected result, so you might need to tweak this. In order to get your desired result from the example data, I used >= comparison so that the latter item would override the former in case of a tie. But I suspect this might not be sufficiently robust for all cases.

The requirements are not clear.

Thats always a good idea.
For example: why is there a map with only one key?
Also nested data is often a pain in the ***.

Thanks for such a valuable response. Actually by hard coded i mean that the parameters won’t always be name and year. It can be one or more arguments.

The difference between [2, 1] and [2, 2] is that the later is the updated value of r1 hence updating it by + 1.

which structure would be more accessible, should i use Keyword Lists or Structs. Kindly help me on that.

Thanks

Actually by hard coded i mean that the parameters won’t always be name and year. It can be one or more arguments.

When you use tuples, you are saying that the data format (e.g. number of arguments) will always be the same. If you plan on having additional arguments, you should be using a List, or preferably a Map/Struct so anyone else reading your code can easily see what the values are supposed to represent.

The difference between [2, 1] and [2, 2] is that the later is the updated value of r1 hence updating it by + 1.

If r is being incremented, you should not use atoms for this, but an Integer.

which structure would be more accessible, should i use Keyword Lists or Structs. Kindly help me on that.

Some of your key names are ambiguous - like, I have no idea, what values.u means, or what :b signifies, and I’m only just now learning the significance of :r1, :r2 etc… So it’s difficult for me to say what specifically would be the best way to structure your data, but here is a try:

%{
  language: "Elixir",
  year: 2020,
  user_values: [
    %{type: "b", revision: 2, value: 1}
    %{type: "b", revision: 3, value: 2},
  ],
  other_param: "foo",
  status: true
}

and here is the code I gave above, updated for this structure:

defmodule Test do
  def unique(todo, seen \\ %{})

  def unique([], seen), do: Map.values(seen)

  def unique([h | t], seen) do
    prev_best = Map.get(seen, {h.language, h.year})

    if is_nil(prev_best) or latest_revision(h) >= latest_revision(prev_best) do
      new_seen = Map.put(seen, {h.language, h.year}, h)
      unique(t, new_seen)
    else
      unique(t, seen)
    end
  end

  def latest_revision(map) do
    Enum.max_by(map.user_values, & &1.revision)
  end
end
3 Likes

Elixir and 2020 : these are just the values and it can be as many values.

You say these are the values, but then you have a different map called :values. Whatever is the “payload” - the values you need for your API, those should be kept together and made easily accessible.

Last Value in Tuple : Last value in the tuple will be a map, struct, keyword list which will always have two keys i.e. values and status .

You will notice that there is not even a function in Elixir to get the last value from a tuple. You must hard-code the specific index you want. This is because you are not supposed to have variable-length tuples. (I already mentioned this in my previous post)

u : It is a key for storing values. we can have many different keys like j , s , c which means union, join, select or cross product.

Why does it matter which query type was used? It doesn’t make sense to me that you would do multiple queries for the same data, and merge them into the same entry, but then still have problems with duplicate entries.

{:b, :r2, 1} : :r2 is just the unique identifier while 1 is the count of elixir and 2020 in the table where the greater value tuple will replace the smaller value.

If :r2 is a unique identifier, then it shouldn’t be hidden away inside a tuple, it should be used as a key in a map, which will ensure that only one can exist at any given time. Then if a new :r2 payload comes in, your code should decide right then whether or not to replace the existing one.

2 Likes

I use elem(tuples, tuple_size(tuple, -1))

You are right, infact I can make r1 as key and 1 as its value. But then how will I utilize the :u or :j … Thats a bit of a confusing part. It’s necessary for my structure.

I use elem(tuples, tuple_size(tuple, -1))

Yes I know there is this workaround, but it is not the intended use of tuples. From the docs:

Tuples are intended as fixed-size containers for multiple elements. To manipulate a collection of elements, use a list instead. Enum functions do not work on tuples.

…

But then how will I utilize the :u or :j … Thats a bit of a confusing part. It’s necessary for my structure.

If you want advice on this, you must give us some more details about why it is necessary (i.e. how this data is used by your API)

2 Likes