Accessing element in a List

kodepett · November 13, 2021, 7:01pm

Is there a straightforward way of retrieving an element in this list [ {"server", "nginx"}, {"date", "Sat, 13 Nov 2021 18:25:48 GMT"}, {"content-type", "UTF-8"}, {"content-length", "23"}, {"freeflow", "FB"} ]

I’m afraid the ordering might change and hence can’t rely on elem . I’ve perused the doc and can’t find a straightforward of extracting individual fields using the first element(string) in the tuple as the key.

thiagomajesk · November 13, 2021, 7:22pm

If you can’t rely on the order I guess you’d have to use Enum.filter and friends.

PS: You could also transform this list into a map and use that map as a lookup.

mpope · November 13, 2021, 7:51pm

If the elements are always tuples of two, I think you can use Keyword.get/3. Something like:

req =
  [ 
    {"server", "nginx"},
    {"date", "Sat, 13 Nov 2021 18:25:48 GMT"}, 
    {"content-type", "UTF-8"},
    {"content-length", "23"},
    {"freeflow", "FB"}
  ]
server = Keyword.get(req, "server")

krstfk · November 13, 2021, 7:57pm

Not necessarily the most efficient but should work (not tested) :

def get_elem([{key, v}| _tl], str) when key == str do
   {:ok, v}
end
def get_elem([_ | tl ], str), do: get_elem(tl, str)
def get_elem([], _str), do: :error_not_found

It’s also not necessarily the most idiomatic, Enum.find would be better.
Keyword won’t work since the first element of your tuple is a string and not an atom.

josevalim · November 13, 2021, 7:58pm

Unfortunately Keyword expects the keys to be atoms. To lookup for any key, use List.keyfind/3 instead, such as:

List.keyfind(list, “server”, 0)

The above will look up on a list of tuples returning the first element where the position 0 of the tuple is the string ”server”.

kodepett · November 13, 2021, 8:00pm

Awesome, thank you.

krstfk · November 13, 2021, 8:08pm

TIL List.keyfind/3 is backed by a BIF. Nice, thanks.

eksperimental · November 13, 2021, 8:16pm

Filter will return all the elements that have “server” as the first element of the tuple.

eksperimental · November 13, 2021, 8:33pm

Just to illustrate another possibility,

You can use Enum.find/3 and Enum.find_value/3

iex> Enum.find_value(list, fn {"server", v} -> v; _ -> nil end)   
"nginx"

Enum.find_value is useful since you already know the key, you are probably interested in the value. Anyway I think it is more performant List.keyfind + pattern matching since it is a built-in internal function.

Another comment is that Elixir v1.13 is introducing List.keyfind!/3 you can read more about it here: List — Elixir v1.13.0-rc.0

krstfk · November 14, 2021, 5:35pm

I don’t want to hijack this topic, however the fact that List.keyfind/3 is backed by a bif got me curious.

I suspect, that this is because there’s no way to implement that function with arbitrary length tuples (should I say n-uples ?), and keys at arbitrary position, in pure Erlang, rather than raw speed considerations.

Benchmarking this leads to surprising results, on my machine with otp 24 with jit enabled:

defmodule Experiments do
  def get_elem(str, [{str, el} | _tl]) , do: el
  def get_elem(_str, []), do: :error_not_found
  def get_elem(str, [_ | tl]), do: get_elem(str, tl)

end

with

list = Enum.to_list(1..10_000_000)
tuples = Enum.map(list, fn el -> {"#{el}", "#{el}" } end)

Benchee.run(%{
  "get elem" => fn -> Experiments.get_elem("9999999", tuples) end,
  "keyfind" => fn -> List.keyfind(tuples, "9999999", 0) end,
}
)

gives the following result :

Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.12.3
Erlang 24.1.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking get elem...
Benchmarking keyfind...

Name               ips        average  deviation         median         99th %
get elem          7.31      136.76 ms     ±4.42%      134.81 ms      163.94 ms
keyfind           5.63      177.49 ms     ±6.72%      174.30 ms      234.07 ms

Comparison:
get elem          7.31
keyfind           5.63 - 1.30x slower +40.73 ms

That’s not to say one should use a custom function in that situation (as always, with performances, one should consider the input size, strike a balance between readability and gains, and benchmark if needed), it was just interesting.

josevalim · November 14, 2021, 7:56pm

You can implement it using regular Elixir:

defmodule Experiments do
  def keyfind([head | tail], val, pos) when elem(head, pos) === val, do: head
  def keyfind([_ | tail], val, pos), do: keyfind(tail, val, pos)
  def keyfind([], _val, _pos), do: nil
end

krstfk · November 14, 2021, 11:10pm

First, I meant efficiently (sorry for the misunderstanding), although I was wrong, kind of - more on that in a bit.

Second, thank you I’ve learned something really cool. It’s probably featured prominently in the guards documentation. When a guard raises, the next clause is evaluated, really cool! (for people, who, like me were unaware that means that even if :

test = [{:a, :b}, {:a,:b,:c}]
elem(hd(test), 2)
** (ArgumentError) errors were found at the given arguments:

  * 1st argument: out of range

    :erlang.element(3, {:a, :b})

Jose’s function works :

 test = [{:a, :b}, {:a,:b,:c}]
 Experiments.keyfind(test, :c, 2)
>>> {:a, :b, :c}

Third, regarding performances, again on OTP 24, with jit

Here’s my amended module :

defmodule Experiments do
  def get_elem(str, [{str, _} = row | _tl]) , do: row 
  def get_elem(_str, []), do: :error_not_found
  def get_elem(str, [_ | tl]), do: get_elem(str, tl)

  def get_elem_pos_10(str, [{_, _, _, _, _, _, _, _, _, str} = row | _tl]) , do: row 
  def get_elem_pos_10(_str, []), do: :error_not_found
  def get_elem_pos_10(str, [_ | tl]), do: get_elem(str, tl)

  def keyfind([head | _tail], val, pos) when elem(head, pos) === val, do: head
  def keyfind([_ | tail], val, pos), do: keyfind(tail, val, pos)
  def keyfind([], _val, _pos), do: nil

end

Here’s my first benchmark :

list = Enum.to_list(1..10_000_000)
tuples = Enum.map(list, fn el -> {"#{el}", "#{el}" } end)


IO.inspect("######### first_elem")
Benchee.run(%{
  "get elem" => fn -> Experiments.get_elem("9999999", tuples) end,
  "elixir keyfind" => fn -> Experiments.keyfind(tuples, "9999999", 0) end,
  "keyfind" => fn -> List.keyfind(tuples, "9999999", 0) end,
}
)

Results :

"######### first_elem"
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.12.3
Erlang 24.1.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking elixir keyfind...
Benchmarking get elem...
Benchmarking keyfind...

Name                     ips        average  deviation         median         99th %
get elem                7.46      133.99 ms     ±2.23%      133.04 ms      143.36 ms
elixir keyfind          6.41      156.09 ms     ±5.26%      153.76 ms      196.42 ms
keyfind                 5.73      174.38 ms     ±1.24%      173.72 ms      183.61 ms

Comparison:
get elem                7.46
elixir keyfind          6.41 - 1.16x slower +22.10 ms
keyfind                 5.73 - 1.30x slower +40.39 ms

Trying to find at the 10th position or around, by matching directly is clearly a loser, however the results are still surprising with your implementation:

list = Enum.to_list(1..10_000_000)

tuples10 = Enum.map(list, fn el -> {"#{el}", "#{el}","#{el}", "#{el}","#{el}",
  "#{el}","#{el}", "#{el}", "#{el}", "#{el}"  } end)

IO.inspect("########### eigth_elem")
Benchee.run(%{
  "elixir keyfind 8" => fn -> Experiments.keyfind(tuples10, "9999999", 7) end,
  "keyfind 8" => fn -> List.keyfind(tuples10, "9999999", 7) end,
},
  time: 20
)
IO.inspect("########### tenth_elem")
Benchee.run(%{
  #"get elem 10" => fn -> Experiments.get_elem_pos_10("9999999", tuples10) end,
  "elixir keyfind 10" => fn -> Experiments.keyfind(tuples10, "9999999", 9) end,
  "keyfind 10" => fn -> List.keyfind(tuples10, "9999999", 9) end,
},
  time: 20
)

Results:

"########### eigth_elem"
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.12.3
Erlang 24.1.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 44 s

Benchmarking elixir keyfind 8...
Benchmarking keyfind 8...

Name                       ips        average  deviation         median         99th %
elixir keyfind 8          4.16      240.32 ms     ±4.66%      237.33 ms      323.00 ms
keyfind 8                 4.02      248.68 ms     ±3.69%      246.13 ms      304.43 ms

Comparison:
elixir keyfind 8          4.16
keyfind 8                 4.02 - 1.03x slower +8.36 ms
"########### tenth_elem"
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.12.3
Erlang 24.1.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 44 s

Benchmarking elixir keyfind 10...
Benchmarking keyfind 10...

Name                        ips        average  deviation         median         99th %
keyfind 10                 3.46      288.84 ms     ±5.17%      284.73 ms      356.50 ms
elixir keyfind 10          3.42      292.25 ms     ±4.59%      289.32 ms      349.21 ms

Comparison:
keyfind 10                 3.46
elixir keyfind 10          3.42 - 1.01x slower +3.41 ms

However, looking for the 20th element seems breaks the elixir implementation :

list = Enum.to_list(1..10_000_000)
tuples20 = Enum.map(list, fn el -> {"#{el}", "#{el}","#{el}", "#{el}","#{el}",
  "#{el}","#{el}", "#{el}", "#{el}", "#{el}", "#{el}",
  "#{el}","#{el}", "#{el}","#{el}", "#{el}","#{el}", "#{el}", "#{el}", "#{el}"  } end)
IO.inspect("########### twentieth_elem")
Benchee.run(%{
  "elixir keyfind 20" => fn -> Experiments.keyfind(tuples20, "9999999", 19) end,
  "keyfind 20" => fn -> List.keyfind(tuples20, "9999999", 19) end,
}
)

Results:

"########### twentieth_elem"
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.12.3
Erlang 24.1.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking elixir keyfind 20...
Benchmarking keyfind 20...

Name                        ips        average  deviation         median         99th %
keyfind 20                 0.33         2.99 s     ±0.00%         2.99 s         2.99 s
elixir keyfind 20          0.25         3.94 s     ±0.00%         3.94 s         3.94 s

Comparison:
keyfind 20                 0.33
elixir keyfind 20          0.25 - 1.32x slower +0.95 s

Now, I have more questions than answers:

Would the results be the same on OTP 23 (sadly I can’t test this right now)
Would they hold for any key type (sadly I can’t test this right now - but glancing at the bif implementation, it would seem all keys are not created equal)
If they were, wouldn’t it be advantageous to switch to the pure elixir implementation, unless, maybe, many users use crazy long tuples, and reach for the n-th key where n >= 10?

Anyways, thanks for the interesting and stimulating answer.

stefanchrobot · November 15, 2021, 12:45pm

Are those HTTP headers? Your library might provide a helper for that, e.g. Plug.Conn.get_req_header/2 or Tesla.get_header/2.

kodepett · November 15, 2021, 12:52pm

I’m using Finch which is based on MINT - not sure there’s a function for that.

stefanchrobot · November 16, 2021, 8:40am

That seems to be the case. If you decide to roll your own version it’s worth looking at both those APIs. Plug.Conn seems to be doing the “more correct” thing of returning a list of values.