Trying to figure out why 2.3 MB JSON binary allocates 35 MB of heap

I’m trying to debug large memory spikes in our production environment. I tried to see if I could create something similar with as minimal application as I could. I’m using newest Erlang and Elixir, but saw the same behaviour with OTP 21 and Elixir 1.10.

So I created a new app with mix new, installed Jason and opened up iex.

I wanted to create a fairly large JSON, so I decided to quickly create a map that I would use as a values for a bigger map. So here’s the value

iex(1)> value = %{"foo" => ["a", "b", "c"], "bar" => [1, 2, 3]}

I wanted to see what the memory allocation looks like before I create the bigger map, so I opened up Observer and saw something like this in Memory allocators tab
Screenshot 2021-02-01 at 10.22.35

Total, eheap_alloc and binary_alloc are the most interesting ones.

Then I created the bigger map

iex(3)> map = 1..50_000 |> Enum.reduce(%{}, fn idx, acc -> key = "key#{idx}"; Map.put(acc, key, value) end)

It’s a bit hard to read, but it basically creates a map with 50 000 keys where the value is always taken from the value variable created before.

So now the memory allocations look as follows
Screenshot 2021-02-01 at 10.23.15

So eheap_alloc went up ~13 MB and binary alloc about 2 MB. This kind of makes sense as :erts_debug.flat_size(map) * 8 (64-bit system) would give about 23 MB and some of that map is probably shared in memory.

What I don’t understand, however, is the memory allocation of the next one

iex(4)> json = Jason.encode!(map)

Screenshot 2021-02-01 at 10.24.34

eheap_alloc goes up almost 35 MB while binary_alloc goes up about 2.4 megs, which is pretty much the size of the binary

iex(4)> byte_size(json)
2338895

I’ve learned that binaries larger than 64 kB are stored outside of the process in a special binary heap, which I would assume is represented by the binary_alloc as it nicely matches to the size of the binary.

But where does that 35 MB of (process?) heap allocation come from?

I also tried making the json an iolist with Jason.encode_to_iodata! and it allocates even more heap.

Thanks for any insights!

2 Likes

It seems like the garbage collector doesn’t know how to clean up the memory. If you move the Jason.encode into a Task.async + Task.await then it doesn’t increase as much.

Can you try measuring those outside of IEx, using a tool such as Benchee or a regular script file? IEx, for example keeps a history of all values, which means intermediate data won’t be garbage collected. Code is also evaluated, which is slower and may allocate more garbage.

1 Like

Thanks @olafura and @josevalim, I’ll play with those.

One thing I noticed is that if I do json = "#{inspect map, limit: :infinity}" instead of encoding with Jason, the memory spikes higher (closer to 200 total) for a second, but comes right back down to where it was (except for the increased binary allocation of around 3 megs which is in line with the byte size).

So to me it looks like that with inspect the garbage collector does its work, but not with Jason. I guess inspect spawns a new process and shuts it down which triggers the garbage collector :thinking:

1 Like

So I created the following script

defmodule FooTest do
  def run() do
    value = %{"foo" => ["a", "b", "c"], "bar" => [1, 2, 3]}

    :observer.start()
    Process.sleep(10000)

    IO.puts("creating map")
    map =
      1..50_000
      |> Enum.reduce(%{}, fn idx, acc ->
        key = "key#{idx}"
        Map.put(acc, key, value)
      end)

    Process.sleep(10000)

    IO.puts("creating json")
    Jason.encode!(map)

    Process.sleep(10000)
  end
end

FooTest.run()

and ran it with mix run footest.exs

eheap_alloc raised from about 4 MB to 9MB after creating the map. Creating the JSON raised it to almost 60 MB. I ran it multiple times and the results were pretty similar.

Then I replaced Jason.encode!(map) with

Task.async(fn -> Jason.encode!(map) end) |> Task.await()

which lead to pretty inconsistent results. Maybe about 50% of the time I saw no memory spike whatsoever (probably because encoding + garbage collection was faster than the stats refresh rate). In the rest of the cases eheap_alloc jumped to 40-110 MB.

So it seems that the JSON still allocates way more memory than the data requires. How fast it clears up just depends on the process lifecycle and thus garbage collection.

Thoughts on why it allocates so much memory, even if temporarily?

We’re using Phoenix in production and we have an HTTP API that can potentially create a over 3 MB JSON response. With many concurrent users, the memory consumption explodes especially when Phoenix renders it as IO list.

We can surely make the JSON smaller and cut down on the requests being made, but still the memory consumption seems strangely high. We’ll probably need to encode the JSON to a binary and in a separate process as a quick solution to help with the situation.

Here you go:

Environment

asdf current
elixir          ref:master      /home/…/.tool-versions
erlang          23.2.3          /home/…/.tool-versions
…
dependencies
Resolving Hex dependencies...
Dependency resolution completed:
New:
  benchee 1.0.1
  deep_merge 1.0.0
  jason 1.2.2
* Getting benchee (Hex package)
* Getting jason (Hex package)
* Getting deep_merge (Hex package)
==> deep_merge
Compiling 2 files (.ex)
Generated deep_merge app
==> benchee
Compiling 39 files (.ex)
Generated benchee app
==> jason
Compiling 8 files (.ex)
Generated jason app
operating system
$ uname -a
Linux archlinux 5.10.9-arch1-1 #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000 x86_64 GNU/Linux

$ cat /proc/version
Linux version 5.10.9-arch1-1 (linux@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000

$ cat /etc/*release
LSB_VERSION=1.4
DISTRIB_ID=Arch
DISTRIB_RELEASE=rolling
DISTRIB_DESCRIPTION="Arch Linux"
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux

Benchee

without jason

example.exs
Mix.install([:benchee])
value = %{"foo" => ["a", "b", "c"], "bar" => [1, 2, 3]}

Benchee.run(
  %{
    "map" => fn -> 1..50_000 |> Enum.reduce(%{}, fn idx, acc -> Map.put(acc, "key#{idx}", value) end) end
  },
  time: 10,
  memory_time: 2
)
output
Not all of your protocols have been consolidated. In order to achieve the
best possible accuracy for benchmarks, please ensure protocol
consolidation is enabled in your benchmarking environment.

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 15.52 GB
Elixir 1.12.0-dev
Erlang 23.2.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking map...

Name           ips        average  deviation         median         99th %
map          17.77       56.29 ms    ±88.83%       38.89 ms      311.39 ms

Memory usage statistics:

Name    Memory usage
map         25.17 MB

**All measurements for memory usage were the same**

with jason

example.exs
Mix.install([:benchee, :jason])
value = %{"foo" => ["a", "b", "c"], "bar" => [1, 2, 3]}

Benchee.run(
  %{
    "map" => fn -> 1..50_000 |> Enum.reduce(%{}, fn idx, acc -> Map.put(acc, "key#{idx}", value) end) |> Jason.encode!() end
  },
  time: 10,
  memory_time: 2
)
output
Not all of your protocols have been consolidated. In order to achieve the
best possible accuracy for benchmarks, please ensure protocol
consolidation is enabled in your benchmarking environment.

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 15.52 GB
Elixir 1.12.0-dev
Erlang 23.2.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking map...

Name           ips        average  deviation         median         99th %
map           4.66      214.74 ms    ±44.25%      181.33 ms      567.80 ms

Memory usage statistics:

Name    Memory usage
map         73.19 MB

**All measurements for memory usage were the same**
7 Likes

NB you may well find that the jiffy nif uses significantly less interim memory. It doesn’t do exactly the same as the other nifs, but maybe it’s close enough.

1 Like

with jiffy

example.exs
Mix.install([:benchee, :jiffy])
value = %{"foo" => ["a", "b", "c"], "bar" => [1, 2, 3]}

Benchee.run(
  %{
    "map" => fn ->
      1..50_000
      |> Enum.reduce(%{}, fn idx, acc -> Map.put(acc, "key#{idx}", value) end)
      |> :jiffy.encode()
    end
  },
  time: 10,
  memory_time: 2
)
$ elixir example.exs
Not all of your protocols have been consolidated. In order to achieve the
best possible accuracy for benchmarks, please ensure protocol
consolidation is enabled in your benchmarking environment.

Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
Number of Available Cores: 8
Available memory: 15.52 GB
Elixir 1.12.0-dev
Erlang 23.2.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking map...

Name           ips        average  deviation         median         99th %
map           6.92      144.60 ms    ±55.27%      124.78 ms      521.25 ms

Memory usage statistics:

Name    Memory usage
map         30.78 MB

**All measurements for memory usage were the same**

Note: jiffy is now working as same as jason … It looks like that it’s breaking large JSON into iolist when encoding and also data |> :jiffy.encode() |> :jiffy.decode() is not equal to data.

1 Like

When I test it seems to fix the memory usage which means that there are some things that aren’t cleaned up when using Jason.encode in the beam.

value = %{"foo" => ["a", "b", "c"], "bar" => [1, 2, 3]}

Benchee.run(
  %{
    "map" => fn ->
      map = 1..50_000
      |> Enum.reduce(%{}, fn idx, acc -> Map.put(acc, "key#{idx}", value) end)
      task = Task.async(fn -> Jason.encode!(map) end)
      Task.await(task)
    end
  },
  time: 10,
  memory_time: 2
)

I have considered recently using a rust json encoder / decoder to make the operation as fast and memory efficient as possible. It is something our projects do so often and the json can get very large. It makes sense to do it as fast and efficiently as possible, particularly since it is a very well defined operation.

https://hexdocs.pm/jsonrs/readme.html

3 Likes

You will always find me shilling for Rust but in this case you might be even better served by jiffy (linked by @skunkwerks earlier) which is a well-established C/C++ library with Erlang bindings.

jsonrs looks fantastic. AFAICT it doesn’t support yielding, which means you are competing for the dirty scheduler pool every time. I’m curious to know how it performs in a busy BEAM as a result.

1 Like

That’s my main concern when writing the Rust code for my sqlite3 database library: the Elixir<=>Rust bridge (rustler) doesn’t seem to support yielding yet.

2 Likes

Jiffy looks great like jsonrs, but I’d assume jsonrs requires Rust compiler and would like to avoid installing that in pipelines.

Ideas how to encode structs like Date with jiffy? It’s basically turning the whole map into an object instead of a date string. It’s quite easy to forget to turn in into a string manually in Phoenix views before encoding and one might not realize that the JSON is not what was expected.

The Rust toolchain is trivial to install, I’d argue on certain Linux distros it’s actually easier to install than the normal C/C++ build toolchain.

I actually opted to make NIFs in my Elixir libraries only in Rust because it was a few seconds ordeal on Windows 10, as opposed to a pretty long dance with either MinGW, GnuWIN32, Cygwin or Visual Studio’s environment (or all of them :003:).

This can be easier with the @ityonemo Zigler

Never tried Zig even though I only heard good things about it (and the creator is very dedicated and improves the tooling regularly which is a huge plus). I suppose I am happy with knowing Rust only on the side of the low-level languages.

But is Zig as easy to install as Rust? With Rust you install its version manager with one command then 1-2 more commands to install the latest stable version. It’s elementary, even on Windows.

1 Like

Seems to be so easy, that people is using zig as a general c and c++ compiler, given that it is so easy. See this report for example: Zig Makes Go Cross Compilation Just Work - DEV Community

1 Like

Yeah, exactly that kind of tooling is what I love. Zig’s author has my respect for what he did!

1 Like