Ecto.Schema/Struct with lots (100) keys ... would this affect performance/memory usage?

chanon · April 3, 2019, 3:38pm

I am designing my datamodel, and one of the tables will have almost 100 columns.

I want to use Ecto.Schema to make some things simpler.

I won’t be querying all 100 columns at once. Maybe only 5-10 at a time. I’m wondering:

If I query only 5-10 columns, will elixir be using only enough memory for those 5-10 fields for the structs? Or will each struct instance have 100 fields with values set to nil? (Thus using lots more memory than if I made a query that didn’t use Ecto.Schema and returned maps.)
Is there any other performance impact of doing using Ecto.Schema and structs in general with so many fields that I should be aware of?

LostKobrakai · April 3, 2019, 3:43pm

I’d suggest using multiple schemas, where each only holds the fields it needs from the database table.

idi527 · April 3, 2019, 4:04pm

Yes. But the difference should be insignificant (in absolute terms). You can check with :erts_debug.size/1 probably, which would show how much is stored on heap.

iex(1)> :erts_debug.size(%{})
4 # in heap words, and a word is 8 bytes in 64-bit implementation, so an empty map is 32 bytes
iex(2)> :erts_debug.size(%{map: 1, with: 2, few: 3, elements: 4})
12 # 96 bytes
iex(4)> 1..100 |> Enum.map(fn i -> {i, nil} end) |> Enum.into(%{}) |> :erts_debug.size()
358 # 2864 bytes

Please check out Erlang -- Advanced for a more accurate explanation.

Other than just the keys, there is also some cost associated with schema metadata, but that should also be a non-issue since ecto 3.

But having multiple smaller schemas would still be a good idea from the operational point of view – fewer error prone selects, probably more descriptive names (e.g. PurchasedItem with ten fields instead of Item with a hundred fields).

Qqwy · April 3, 2019, 4:59pm

Can you tell us more about your data model? The fact that you will have a table with roughly 100 columns seems slightly concerning and, while there are of course exceptions where this might be the best way, it might also instead point at a flaw in the data model construction.

chanon · April 4, 2019, 2:38am

Thank you everyone. Regarding the data model, it is to support user defined fields. I’m still not 100% sure about it, but I don’t want to use EAV and jsonb isn’t good for storing lots of single fields that update often.

Since there does seem to be a performance/memory impact, I guess I’ll not use Ecto.Schema for the columns for user defined fields.

Qqwy · April 4, 2019, 5:49am

The main question, I think, is how these user-defined fields are supposed to be queried?

chanon · April 4, 2019, 7:27am

Will depend on user defined views where they can select the fields to display, filter and sort by.

LostKobrakai · April 4, 2019, 8:55am

If columns are user selectable I’d look into the schemaless options ecto provides, like schemaless changesets or using plain tables in queries and load data into runtime representations using Repo.load.

pedromvieira · April 4, 2019, 10:13am

Also you can use a single JSONB field in Postgres, allong ‘normal’ fields, to hold any schemaless data. The query syntax will be different, but the benefits are great, including indexing the whole thing.
In Elixir, this will be showed as a map.

chanon · April 4, 2019, 11:27am

Yes, I initially was going to use JSONB … but then I found out that whenever a single value inside a JSONB changes, to update it PostgresQL has to read the whole thing and write a new version of the whole thing back.

So I thought that would be bad for performance if there are a lot of fields and the data changes often.

Thinking about it, I’m still not sure which way to go.

Hmm… I guess I’ll just write some code to generate some test data for each approach and see which is best.

OvermindDL1 · April 4, 2019, 2:27pm

Are updates really that often? Even 100 entries aren’t that much. Could just use another table linking back to that one with fields of a key and data as well. I think you can update a specific JSONB sub-entry via an update query though too?

Qqwy · April 4, 2019, 4:56pm

And another thing to consider: How often does a user want all 100 fields at the same time? If they only want at most ten at average, (besides as @OvermindDL1pointing out that ‘100’ itself is not that much at all), it is even faster.

It might make the most sense to build a simple solution first, and measure its actual speed in practice before deciding if a more complicated solution needs to be considered.

nsweeting · April 4, 2019, 6:16pm

Couldnt each “field and value” just be a single record that belongs to the parent record? Seems a lot easier in my mind.

chanon · April 4, 2019, 6:22pm

Yes, everyone has given some good ideas.

I will do a benchmark to test performance for each way … (EAV, jsonb, lots of columns) so I have solid data instead of just assumptions.

dimitarvp · April 4, 2019, 6:39pm

I would go with what @LostKobrakai suggested – have one DB table but make several Ecto schema modules that use subset of its fields. That would be most intuitive for you – and easier to work with. Might be a little extra effort at the start but it’s well worth it.

Or you can ditch schemas altogether and just make several helper modules that do schemaless queries and return plain old structs/maps with just the fields you need (although in terms of effort that might be more coding than just making Ecto schemata). This would also probably alleviate your worries with memory usage – although a struct with 100 keys isn’t really that much in any language nowadays.