Hello, I’m trying to figure out how much binary memory a process is holding onto by summing the results from Process.info(pid, :binary)
, but it’s giving me inaccurate results. On one machine with 128GB of RAM it says a process is using over 1TB of memory.
I think I’ve tracked the problem down to Process.info
returning a list that contains the same binaries multiple times.
Here’s an Elixir script that demonstrates what I’m seeing. The “junk_file” is a file I created that has a bunch of JSON objects with one object per line.
result =
Stream.iterate(0, & &1 + 1)
|> Stream.take(10_000)
|> Flow.from_enumerable(max_demand: 1)
|> Flow.flat_map(fn _ ->
File.read!("junk_file") |> String.split("\n")
end)
|> Enum.to_list()
system_binary_bytes =
:erlang.memory() |> Keyword.get(:binary)
{:binary, binaries} =
Process.info(self(), :binary)
binary_ids =
Enum.map(binaries, &elem(&1, 0))
binary_bytes =
Enum.map(binaries, &elem(&1, 1))
|> Enum.sum()
unique_binary_bytes =
Enum.uniq_by(binaries, fn {id, _, _} -> id end)
|> Enum.map(&elem(&1, 1))
|> Enum.sum()
unique_ids =
MapSet.new(binary_ids)
unique_ids_times_ref =
Enum.uniq_by(binaries, fn {id, _, _} -> id end)
|> Enum.map(& elem(&1, 2))
|> Enum.sum()
IO.puts("binary_ids #{Enum.count(binary_ids)}")
IO.puts("unique_ids #{Enum.count(unique_ids)}")
IO.puts("binary_bytes #{binary_bytes}")
IO.puts("unique_bytes #{unique_binary_bytes}")
IO.puts("unique_ids_times_ref #{unique_ids_times_ref}")
IO.puts("system_binary_bytes #{system_binary_bytes}")
Enum.count(result)
Here are the results from an example run:
binary_ids 2166004
unique_ids 10003
binary_bytes 373817119944
unique_bytes 1725843360
unique_ids_times_ref 2166009
system_binary_bytes 1731508128
As you can see, quite a few ids are repeated and summing the reported memory of each binary after deduplicating them by binary id results in a total much closer to the system total.
Is it safe to assume that binary ids are unique and can be used to deduplicate the list? Is this the correct way to resolve my problem?