I’m managing my Postgres database with Ecto and I have come across a situation: In some of the fields for some of my records, I save empty values: For this case, it would be an empty list []
.
Empty list or nil
, which is better or more efficient?
1 Like
empty list.
See this PR that was an optimization to MapSet.
elixir-lang:master
← x4lldux:optimize-mapset-etf
opened 05:05PM - 31 Mar 17 UTC
MapSet is internally represented as a map with keys as set's elements and stub v… alues of `true`, like: `%{ :key => true }`. When MapSet, in it's current format, is written to a DETS file, or serialized using `:erlang.term_to_binary/2`, it's **two times** as big as the smallest possible case! If the stub value is replaced with an empty list, to a format `%{ :key => [] }`, there is a significant space reduction, see:
```Elixir
iex(1)> ml = for i<- 1..1_000_000, do: {i,[]}, into: %{}; :ok
:ok
iex(2)> mt = for i<- 1..1_000_000, do: {i,true}, into: %{}; :ok
:ok
iex(3)> {:ok, table_ml} = :dets.open_file(:disk_storage_ml, [type: :duplicate_bag])
{:ok, :disk_storage_ml}
iex(4)> {:ok, table_mt} = :dets.open_file(:disk_storage_mt, [type: :duplicate_bag])
{:ok, :disk_storage_mt}
iex(5)> size_ml = File.lstat!("disk_storage_ml").size
5432
iex(6)> size_mt = File.lstat!("disk_storage_mt").size
5432
iex(7)> :dets.insert_new(:disk_storage_ml, {1, ml})
true
iex(8)> :dets.insert_new(:disk_storage_mt, {1, mt})
true
iex(9)> size_ml_dets = File.lstat!("disk_storage_ml").size
6004693
iex(10)> size_mt_dets = File.lstat!("disk_storage_mt").size
12004693
iex(11)> size_ml_etf = ml |> :erlang.term_to_binary |> byte_size
5999241
iex(12)> size_mt_etf = mt |> :erlang.term_to_binary |> byte_size
11999241
iex(13)> size_ml_dets/size_mt_dets
0.5001954652234755
iex(14)> size_ml_etf/size_mt_etf
0.49996837299959224
```
Since only keys from the map are of any importance, changing from `true` to `[]`, doesn't change any of the functionality. MapSet is an opaque type, so nobody should be using it's internal format either.
This changes only how much space MapSet uses when it's serialized to ETF format, no RAM usage changes were observed.
1 Like
dorgan
December 22, 2021, 11:20pm
3
Tangential fun fact: In Erlang, the empty list []
is internally called nil
, and it has it’s own position in the erlang term ordering
https://www.erlang.org/doc/reference_manual/expressions.html#term-comparisons
6 Likes
I would first and foremost prioritize clarity, then performance here, since the difference is negligible in most cases. If you have a column that can either have a value, or no value, then use nil
to mean “no value”. If you have a collection of things and that collection can be empty, then use an []
.
4 Likes
al2o3cr
December 23, 2021, 2:33am
5
If you’ve got a JSONB column or a nullable array column, using nil
is going to cause the usual SQL NULL
behaviors while using []
will not.
4 Likes