Why does `embeds_many` enforce a default of `[]`?

Today I noticed that when defining a embeds_many the new struct has a default of [] which is confusing since:

  1. You can have jsonb[] columns with a default of NULL
  2. embeds_one has a default of nil
  3. You can’t define your own default!

I ask because I have code that expected it to behave like other fields and also because {} in postgres is twice as expensive as an null value in postgres:

ecto/schema.ex at master · elixir-ecto/ecto · GitHub.

Looks like it was committed 7 years ago: Remove containers in favor of strategies · elixir-ecto/ecto@7f052d9 · GitHub

Don’t see how this makes the default for embeds many confusing. What else could this value be? And a list of nothing is an empty list, no?

I get what you mean that it could say there is no value here, but that is if you think of the embed many field as the value. I think of the values as the things being embedded.

Using a list over nil probably saves us from nil checks all over the place. Would you prefer to check if it’s nil before you try to operate on it every time?

1 Like

In every database I can find the default for a column that is a list is always null.

It’s also the default for every field and at least one of the two embeds_.

This stands out exclusively against common design and expectation.

Beyond the fact that it actually does impact the database (again double the space cost), there is a massive difference between “this column has no value” and “this column is an empty list”.

You could submit a proposal to the mailing list to make the default configurable and see what they say: https://groups.google.com/g/elixir-ecto.

Changing the default for everyone probably isn’t a good idea because it will break existing code.

I’m not sure why it is empty list but it might have something to do with the fact that embeds and associations are treated very similarly. And when you attempt to preload an association it will give an empty list when none are found. This makes sense because the result of the preload is defined as a list of associations and not a nullable column.

1 Like

I’ll post there next. I wanted to do it here in case there was something I was missing.

I would argue this is the only correct default. Embeds many represents associated records of which there can be 0 or more.
The number of records in NULL would be undefined, which would be incorrect.

3 Likes

Embeds are modeled to align closely with assocs and for assocs empty set doesn‘t mean there‘s a column somewhere set to NULL, but it means there‘s no rows at all. It makes sense to default that field to empty list, because as mentioned before it‘s much cleaner not to check for nil and empty list everywhere.

4 Likes

I can see it both ways: thinking about it as a nullable column with a complex datatype or as denormalized associations.

This is it. :slight_smile:

5 Likes

Ok, but embeds_many implies a collection of stuffs. I would say that if you have a complex type that might be null or a list, then you have one embed that might be null or a list, not a collection of things that might be in the list.

I’m not a not null radical ala CJ Date, and if you do want to have a complex type with a maybe list, model it as such (and it’s still probably not great). But if you call something embeds_many or has_many it implies a countable thing and NULL is not.

Then again op stated:

Which is plainly wrong.

1 Like

Ok, but embeds_many implies a collection of stuffs. I would say that if you have a complex type that might be null or a list, then you have one embed that might be null or a list, not a collection of things that might be in the list.

That’s a good point. I guess there’s a bit of tension if someone wants to take advantage of embedded schemas for a list type but also wants to treat it as a nullable column.

There is sometimes value in allowing both NULL and empty list. The former might signal the data is missing while the latter might signal the data is complete and there is nothing to report.

I just want to say that I really enjoyed this discussion. The argument that “embeds has_many should behave as closely as possible to a normal has_many” is very strong and was enough to make me re-think my data model decision.

That said, the reality is nil and empty list are two different things, conceptually. Imagine a scenario where you want to model out an optional white list of some sort.

  • nil would mean “I’m not using the whitelist; allow everyone in”
  • [] would mean, I don’t want to let anyone in (no one is on the list)
  • ["foo"] would mean, “I only want “foo” to be allowed in”

I am now seeing that you could only have this data model w/ an embedded structure of somewhere since, as this thread points out, a has_many has no null concept; you’d have to add a second field is_using_whitelist (which avoids a “magic nil” which, to be fair, is a smell w/ what I was hoping to do.)

I’m not gonna die on that hill, but -to me- the data you’re suggesting is something like Maybe[OkList]. So you basically have two inhabitants of your type: Some(maybe_empty_list) and None (null).
Personally I’d model that as an embed one.
Again to me embeds_many suggests a list and I’m fine with having either an empty list or a list. I’m a bit dense, so I don’t really know what null would be in that list context.

1 Like

Oh I love that… so the data model would be like

record.whitelist → nil # whitelist is nil; allow all ids
record.whitelist → {ids: ["foo"]} # white list has 1 id; only allow it
record.whitelist → {ids: []} # whitelist has no ids, don't allow anyone

I like that a lot, actually (provided you have to accomplish the optional whitelist feature)

I just stumbled upon this problem in another situation.

I’m incorporating a new field into a system that has been operational for a substantial period. The system requires users to upload sets of data, after which they conduct a series of five intensive computations on data subsets in a prescribed sequence, interspersed with manual adjustments. The new field we’re adding will track the most recent ‘n’ calculations performed on each subset, providing users with a historical context. Moreover, it will determine the last registered step to inhibit calculations not appropriate for the current state.

A complication arises due to the inability to ascertain the calculations already made for the subsets created before the inclusion of this new field. Consequently, we need to implement a strategy to ensure a seamless transition:

  • nil: for pre-existing subsets during migration, the default value is used, and any calculation is permitted.
  • []: for new subsets post-migration, the default value permits only the first calculation.
  • [%{…}, %{…}]: retrieves the last calculation from the head and responds accordingly.

Hence, in our scenario, the absence of data is significant, and an empty list does not adequately convey this.

I believe that similar to a number that could be 0 or null, or a string that might be “” or null, an array should also be permitted to be [] or null.

Null is neither a number, a string, nor a list. Rather, it represents the lack of information in a field. And I think that all (or most) databases, such as Postgres, allow null in array fields for this very reason.