I have a good use case currently with ecto schemas: I need to extend one schema with vectors (text embeddings) fields that I do not want to interact with using the regular schema. And I do not want to repeat all the regular fields and primary key / timestamp type attributes on the extended schema.
I am afraid Ecto schemas muddies the water: since they define structs, do you want to copy all struct fields? What about the database? Are you going to copy the database table structures as well?
@dimitarvp mentioned the XY problem before. You should describe the problem and outline the different solutions you have considered. Perhaps extension does achieve the best trade-offs. But I am afraid a paragraph describing a possible use case is not enough context.
This is what I meant by schema inheritance:
defmodule MyApp.MySchema do
import Ecto.Changeset
use Ecto.Schema
@primary_key {:id, :binary_id, autogenerate: true}
@timestamps_opts [type: :utc_datetime_usec]
schema "my_table" do
field :name, :string
field :age, :integer
# ...
# ...
# lots of fields
timestamps()
end
end
defmodule MyApp.MySchemaWithEmbeddings do
import Ecto.Changeset
use Ecto.Schema
schema "my_table" do
copy_all_fields_from MyApp.MySchema
field :some_embedding_1, Pgvector.Ecto.Vector
field :some_embedding_2, Pgvector.Ecto.Vector
field :some_embedding_3, Pgvector.Ecto.Vector
field :some_float_1, :float
field :some_float_2, :float
field :some_float_3, :float
end
end
Basically the base schema is used often, and most fields are useful so we just select all of them like from(s in MySchema)
.
But when working with embeddings that are vectors of 512 floats, we may not want to select all that data all the time, and do related stuff in the changeset function, etc.
So having a schema that extends another one could be useful, so we can work with the embeddings and related data only when needed, which is just a small feature of the app.
Now I guess writing that copy_all_fields_from
macro will not be hard, and I do not mind copying @primary_key
or @timestamps_opts
attributes. But maybe having a extend_schema
instead of the schema
macro could be nice.
Thanks for the context. I think this is a separate discussion as, opposite to inheritance which inherits data and behaviour, you just want to have the same schema fields. For this reason, I moved to a separate thread.
I can think of two other alternatives:
- Have the embeddings in a separate table which you treat it as a has_one
- Have the embeddings in the same table but model it in Ecto using an association that points to itself
The options above have downsides, as they need additional queries to load data, so copy_all_fields_from
(and potentially extend_schema
in the future) do not sound like bad options.
PS: I think you can change title and tags in this thread but, if you cannot, let me know and I can do it!
One thing you could do is make a virtual field for the vector data then only load it when you need it.
EDIT: oops sorry @josevalim I responded right after you split the thread.
The title and tags looks fine
I can think of two other alternatives:
Yeah I am not ready to be testing, just had worked on the maths and stuff parts in a separate demo codebase inherited from coworkers, and we will now try to make it work in the real app.
The self-pointing association looks cool, I guess the hit on performance should be minimal and I am not sure everyone would be happy with a macro that forces everyone to learn macros and what __schema__/1
is.
Thank you!
We will decide as a team but Iāll update this topic if anyone is interested
Thatās a very interesting idea! More than once I tried modelling a recurring set of fields on a schema, without resorting to a separate database table, or an embedded schema that has to reside in a jsonb column.
For example: many schemas in an application I am working on have the need to be associated with an mail address (the analog one, street/city/etc.). I resorted to using associations and separate tables for each of those schemas that needed an address (users
got a sibling table users_address
, reader
got a sibling table reader_address
, and so on, with FKās between them). I used the ācustom sourceā option when referencing the Address struct (has_one :address, {"users_address", Address}
) to override the which table to use each time I associated a schema with an Address, to avoid them all be in the same table (which might not be a problem after all, I guess ).
Embedding the address didnāt seem right at the time. I needed to be able to search on parts of the address for example (I know that a lot is possible with jsonb colums, but I didnāt have much knowledge about that can of worms at the time).
When I was setting this up I wished I could reuse an Address schema, and have its data in the table of the parent schema. This would have been the best way to model the data from the perspective of the database.
Another alternative I thought about was doing something like timestamps/1
to āinjectā the set of fields that model an address into each schema that needs it. But then an address still would seem like bag of fields, without a āhomeā of some kind. Iām sure this alternative has other downsides too.
The āself-referencingā association seems like a good alternative. I understand that it has the downside of an additional query, because ecto doesnāt actually know that it can get both schema data in one query. Maybe thatās something that can be improved by Ecto by making this pattern a first-class citizen. Would there be other donwsides?
We could optimize in some cases such as joins but preloads are always by definition separate queries. But that may not be a problem given the whole intent is that may be loaded in different places?
I admit that in my example of addresses the Address assoc is not needed in most cases. Weāre just fine preloading it in scenarios where we actually need the Address.
But I donāt think it would be a bad default to load the assoc-in-same-table by default, as there is not much downside to have it there in case you need it. The only downside I can think of is the memory footprint that goes up, and the time required to actually load the struct from the database into the Ecto struct. It might make a difference though.
I guess this approach would conflate the difference between fields and assocs a little bit: fields are obviously part of a schema, and are loaded by default (iirc you can opt-out of this default loading behavior), while common assocs have to be preloaded explicitly.
Old school macros can help here too no?
defmacro foo_fields() do
quote do
field ...
field ...
field ...
end
end
schema "table" do
foo_fields()
field ....
end
schema "table" do
foo_fields()
field ...
end
I just realized my response is semi-nonsense. It could work but probably more trouble than it is worth, especially compared to other answers here.
If it isnāt a hard requirement that they be separate schemas, different selects is a pretty simple solution.
defmodule MyApp.MySchema do
# ...
@embedding_fields [
:some_embedding_1,
:some_embedding_2,
:some_embedding_3
]
def without_embeddings(query) do
from query,
select: ^(__schema__(:fields) -- @embedding_fields)
end
end
defmodule MyApp.MyContext do
def get_schema(id) do
MySchema
|> MySchema.without_embeddings()
|> Repo.get()
end
def get_schema_with_embeddings(id) do
Repo.get(MySchema, id)
end
end
Of course if your consumer code checks for the presence of embedding fields, this isnāt going to work.
Doh, of course! My favorite answer so far.
This is my favorite part of Elixir! Because the compiler is just executing Elixir code, you can get all kinds of benefits most other languages use āstructuralā things like inheritance for.
Want to share fields between a struct?
defstruct [:a, :b] ++ Something.shared_fields()
# elsewhere
defstruct [:c, :d] ++ Something.shared_fields()
Want to share functions between multiple modules?
defmacro shared_functions() do
quote do
def function() do
...
end
end
end
The Elixir compiler gives you all of the tools for code reuse with no need for adopting any confusing patterns. All you need to learn is how macros work and the sky is your limit.
(Obviously you already know these things )
Thanks for making the best programming language of all time
Exactly what I was going to say and I was wondering if I was missing something basic about how Ecto works that made that difficult. I think itās much more in the spirit of Elixir for devs to roll their own macro for these cases, using whatever naming conventions/API they like best.
If the macro can be defined in the same module as the main schema then yes, otherwise Iād rather not add indirection. Actual inheritance āfeelsā more straightforward.
On mobile right now but iirc schema block is evaluated at compile time.
I donāt think that it can be defined in the same module, primarily because modules cannot call macros that they define, from their own module body. Iād personally consider it a good thing
Yeah, well, just define two modules in one file.
Right, of course
defmodule Vehicle.Fields do
defmacro fields() do
quote do
field :capacity, :integer
end
end
end
defmodule Vehicle do
use Ecto.Schema
require Vehicle.Fields
schema "schema" do
Vehicle.Fields.fields()
end
end
defmodule Boat do
use Ecto.Schema
require Vehicle.Fields
schema "schema" do
Vehicle.Fields.fields()
field :wheel_count, :integer
end
end
Iāve dealt personally with a lot of clarity issues that can arise from the implicit aspects here. Just throwing out a potential alternative. What if you wrote something that used schema introspection to verify, instead of inject this information?
defmodule Vehicle.Fields do
@required_fields [
capacity: :integer
]
defmacro __using__(_) do
quote do
@after_compile Vehicle.Fields
end
end
def __after_compile__(env, _) do
for {field, required_type} <- @required_fields do
type = env.module.__schema__(:type, field)
if !type do
raise "Must define the field `#{inspect(field)}` on #{inspect env.module}"
end
if type != required_type do
raise "The field `#{inspect(field)}` on #{inspect env.module} must be of type #{inspect(type)}, got: #{type}"
end
end
end
end
defmodule Vehicle do
use Ecto.Schema
use Vehicle.Fields
schema "schema" do
field :capacity, :integer
end
end
defmodule Boat do
use Ecto.Schema
use Vehicle.Fields
schema "schema" do
field :wheel_count, :integer
end
end
That last module definition would then yield
** (RuntimeError) Must define the field `:capacity` on Boat
iex:31: anonymous fn/3 in Vehicle.Fields.__after_compile__/2
While true, itās also a tool with drawbacks. And now I think of it, those drawbacks mainly consist of others -not- knowing macroāsā¦so we should educate all!
I do. But I doubt I would use a macro for it. The drawback of lacking visibility, some tools struggling and available alternatives are for me reasons to do a bit more manual work instead of a new macro.
For libs itās great though as users donāt have to update their imports and the lib maintainer can change them without worrying about the āmigration guideā
Your own tool might make the last reason obsolete though
Interesting. The only pushback Iād have is that this validation could also be done in a unit test. The code is basically asserting that the fields exist, and maybe thatās not something that needs to be checked with every compilation pass (just imagine this in a library, then this would also be checked after installing the lib, which is probably not the best timing).