Duplicate binaries in Erlang process info

Hello, I’m trying to figure out how much binary memory a process is holding onto by summing the results from Process.info(pid, :binary) , but it’s giving me inaccurate results. On one machine with 128GB of RAM it says a process is using over 1TB of memory.

I think I’ve tracked the problem down to Process.info returning a list that contains the same binaries multiple times.

Here’s an Elixir script that demonstrates what I’m seeing. The “junk_file” is a file I created that has a bunch of JSON objects with one object per line.

result =
  Stream.iterate(0, & &1 + 1)
  |> Stream.take(10_000)
  |> Flow.from_enumerable(max_demand: 1)
  |> Flow.flat_map(fn _ ->
    File.read!("junk_file") |> String.split("\n")
  end)
  |> Enum.to_list()

system_binary_bytes =
  :erlang.memory() |> Keyword.get(:binary)

{:binary, binaries} =
  Process.info(self(), :binary)

binary_ids =
  Enum.map(binaries, &elem(&1, 0))

binary_bytes =
  Enum.map(binaries, &elem(&1, 1))
  |> Enum.sum()

unique_binary_bytes =
  Enum.uniq_by(binaries, fn {id, _, _} -> id end)
  |> Enum.map(&elem(&1, 1))
  |> Enum.sum()

unique_ids =
  MapSet.new(binary_ids)

unique_ids_times_ref =
  Enum.uniq_by(binaries, fn {id, _, _} -> id end)
  |> Enum.map(& elem(&1, 2))
  |> Enum.sum()

IO.puts("binary_ids           #{Enum.count(binary_ids)}")
IO.puts("unique_ids           #{Enum.count(unique_ids)}")
IO.puts("binary_bytes         #{binary_bytes}")
IO.puts("unique_bytes         #{unique_binary_bytes}")
IO.puts("unique_ids_times_ref #{unique_ids_times_ref}")
IO.puts("system_binary_bytes  #{system_binary_bytes}")

Enum.count(result)

Here are the results from an example run:

binary_ids           2166004
unique_ids           10003
binary_bytes         373817119944
unique_bytes         1725843360
unique_ids_times_ref 2166009
system_binary_bytes  1731508128

As you can see, quite a few ids are repeated and summing the reported memory of each binary after deduplicating them by binary id results in a total much closer to the system total.

Is it safe to assume that binary ids are unique and can be used to deduplicate the list? Is this the correct way to resolve my problem?

While I am not sure what exactly is it that you are trying to achieve – sorry for that – have you taken a look at these functions?

  • :erts_debug.size(term)
  • :erts_debug.flat_size(term)
  • :erts_debug.size_shared(term)
1 Like