Ecto and preloading most recent record from association

hubertlepicki · May 6, 2016, 8:03am

So let’s say I have Post that has_many PostVersions. I want to load all posts and their top versions only with a single query. I don’t want to preload all PostVersions because there can be hundreds of them.

Any ideas on how can I do that? I am using Ecto 2.0.

dgamidov · May 6, 2016, 5:41pm

Hello!

@hubertlepicki, here is my example.

It is 1 query from Elixir side:

Repo.preload (Repo.all Post), post_version: from(pv in PostVersion, distinct: pv.post_id, order_by: [desc: pv.updated_at])

, but I think there will be always 2 queries from DB side, because you first should get most young post version for each post (from table post_versions), than get all posts (from table posts) with preloaded youngest post version.

priv/repo/migrations/create_post.exs

defmodule Puma.Repo.Migrations.CreatePost do
  use Ecto.Migration    

  def change do
    create table(:posts) do
      add :name, :string    

      timestamps
    end
    create unique_index(:posts, [:name])    

  end
end

priv/repo/migrations/create_post_version.exs

defmodule Puma.Repo.Migrations.CreatePostVersion do
  use Ecto.Migration  

  def change do
    create table(:post_versions) do
      add :name, :string
      add :post_id, references(:posts, on_delete: :nothing) 

      timestamps
    end
    create index(:post_versions, [:post_id])  

  end
end

mix ecto.migrate

web/models/post.ex

defmodule Puma.Post do
  use Puma.Web, :model    

  schema "posts" do
    field :name, :string
    has_many :post_version, Puma.PostVersion
    
    timestamps
  end   

  @doc """
  Builds a changeset based on the `struct` and `params`.
  """
  def changeset(struct, params \\ %{}) do
    struct
    |> cast(params, [:name])
    |> validate_required([:name])
    |> unique_constraint(:name)
  end
end

web/models/post_version.ex

defmodule Puma.PostVersion do
  use Puma.Web, :model    

  schema "post_versions" do
    field :name, :string
    belongs_to :post, Puma.Post   

    timestamps
  end   

  @doc """
  Builds a changeset based on the `struct` and `params`.
  """
  def changeset(struct, params \\ %{}) do
    struct
    |> cast(params, [:name])
    |> validate_required([:name])
  end
end

priv/repo/migrations/seeds.exs

import Ecto.Query   

alias Puma.{Repo, Post, PostVersion}    

for post <- ~w(first second third) do
  Repo.get_by(Post, name: post) || Repo.insert!(%Post{name: post})
end   

query = from p in Post, select: p.id
posts_ids = Repo.all query    

for post_id <- posts_ids do
  for post_version <- ~w(1 2 3 4 5) do
    Repo.insert!(%PostVersion{post_id: post_id, name: post_version})
    :timer.sleep(1000)
  end
end

mix run priv/repo/migrations/seeds.exs

iex -S mix

iex(1)> alias Puma.{Repo, Post, PostVersion}
[nil, nil, nil]

iex(2)> import Ecto.Query
nil

iex(3)> Repo.preload (Repo.all Post), post_version: from(pv in PostVersion, distinct: pv.post_id, order_by: [desc: pv.updated_at])
[debug] QUERY OK db=2.6ms
SELECT p0."id", p0."name", p0."inserted_at", p0."updated_at" FROM "posts" AS p0 []
[debug] QUERY OK db=4.4ms decode=0.1ms
SELECT DISTINCT ON (p0."post_id") p0."id", p0."name", p0."post_id", p0."inserted_at", p0."updated_at", p0."post_id" FROM "post_versions" AS p0 WHERE (p0."post_id" = ANY($1)) ORDER BY p0."post_id", p0."post_id", p0."updated_at" DESC [[9, 8, 7]]
[%Puma.Post{__meta__: #Ecto.Schema.Metadata<:loaded>, id: 7,
  inserted_at: #Ecto.DateTime<2016-05-06 16:37:56>, name: "first",
  post_version: [%Puma.PostVersion{__meta__: #Ecto.Schema.Metadata<:loaded>,
    id: 35, inserted_at: #Ecto.DateTime<2016-05-06 16:38:00>, name: "5",
    post: #Ecto.Association.NotLoaded<association :post is not loaded>,
    post_id: 7, updated_at: #Ecto.DateTime<2016-05-06 16:38:00>}],
  updated_at: #Ecto.DateTime<2016-05-06 16:37:56>},
 %Puma.Post{__meta__: #Ecto.Schema.Metadata<:loaded>, id: 8,
  inserted_at: #Ecto.DateTime<2016-05-06 16:37:56>, name: "second",
  post_version: [%Puma.PostVersion{__meta__: #Ecto.Schema.Metadata<:loaded>,
    id: 40, inserted_at: #Ecto.DateTime<2016-05-06 16:38:05>, name: "5",
    post: #Ecto.Association.NotLoaded<association :post is not loaded>,
    post_id: 8, updated_at: #Ecto.DateTime<2016-05-06 16:38:05>}],
  updated_at: #Ecto.DateTime<2016-05-06 16:37:56>},
 %Puma.Post{__meta__: #Ecto.Schema.Metadata<:loaded>, id: 9,
  inserted_at: #Ecto.DateTime<2016-05-06 16:37:56>, name: "third",
  post_version: [%Puma.PostVersion{__meta__: #Ecto.Schema.Metadata<:loaded>,
    id: 45, inserted_at: #Ecto.DateTime<2016-05-06 16:38:10>, name: "5",
    post: #Ecto.Association.NotLoaded<association :post is not loaded>,
    post_id: 9, updated_at: #Ecto.DateTime<2016-05-06 16:38:10>}],
  updated_at: #Ecto.DateTime<2016-05-06 16:37:56>}]

hubertlepicki · May 6, 2016, 6:41pm

Well yeah, I could do that, but I don’t want to.

It looks like with some gymnastics I can write a “schemaless query” to do so.

Here’s an article by @josevalim on the subject:
http://blog.plataformatec.com.br/2016/05/ectos-insert_all-and-schemaless-queries/

But to be fair, I gave up. I had dealt with this sort of crap for seven years using ActiveRecord. Enough is enough. I am still in early stages of development on the project and it’s a good opportunity to upgrade to Moebius.

dgamidov · May 6, 2016, 6:43pm

It is not connected with Ecto, it is just how relation databases store data.

josevalim · May 6, 2016, 7:00pm

Well yeah, I could do that, but I don’t want to.

So someone takes all the trouble of writing an answer and your reply is “I don’t want to”? Can you at least clarify what is wrong with his response? It includes everything you need to make it work:

def with_top_versions(query) do
  post_versions = from pv in PostVersions, limit: 10
  from query, preload: [post_versions: post_versions]
end

Post |> Post.with_top_versions

I had dealt with this sort of crap for seven years using ActiveRecord. Enough is enough.

And I have dealt with this sort of feedback for years as well. If using Moebius is going to redirect this sort of feedback elsewhere, then please move on.

hubertlepicki · May 6, 2016, 7:01pm

no, @dgamidov, this is not the fault of relational databases. It is our fault, the programmers, for trying to implement the idea of ORM, and reimplementing it in every new environment we find ourselves in. The concept is familiar, it is easy to get grasp on and is deeply flawed IMHO. I am really thankful for great reply, but I landed on a conclusion this is going into wrong direction entirely.

Now it is time for someone to jump on the subject saying that “Ecto is not ORM”. Well, it is the same principle. Instead of returning objects, it returns structs, and you write some business logic in modules instead of classes.

I am landing on a conclusion that we don’t need all of this. Or maybe I should rephrase: we need all the elements of an ORM, but not bundled as an ORM. Let me quickly explain.

We need something to keep our relational databases in sync when we develop and when we release incremental updates to staging / production. Something like rake db:migrate or `mix ecto.migrate.
We need something that will map, perform coercion and allow us to validate incoming data, whether these are called “form objects” or “commands” does not really matter. This is ActiveModel or Ecto::Schema and validations system.
We need something to perform SQL queries, ensure we don’t easily make dumb SQL injection error, and return data converted to native types of our language. This is Ecto.Query and friends (Repo)
We need something to insert, update, delete records (Repo)

The ORM will provide you an extra feature that you don’t really need, or more precisely: you should not use. I am talking about mapping your database columns to your domain model entities.

hubertlepicki · May 6, 2016, 7:02pm

@josevalim sorry, I was just composing more in depth reply, wrote the above quickly while child was crying

josevalim · May 6, 2016, 7:14pm

Exactly. You need to map the database data to data structures in your language. In Elixir, this means a map or a tuple or a struct. You don’t need to make such structs your “domain model entities”. It is just data but if you treat it as your domain model entity, then it will be your domain model entity.

I am completely OK with taking the blame regarding Ecto 1.0. We did push folks to the wrong direction. But Ecto 2.0 no longer puts you in this place (and soon Phoenix generators will reflect that). If you are using structs as your domain models, then it will be more on you than on me, since you have everything to stop doing that (including using Moebius!).

Thanks for composing a more structured reply.

hubertlepicki · May 6, 2016, 7:29pm

@josevalim I don’t think you are to blame, not at all. You haven’t invented the concept, but implemented something that was already recognized and familiar. And it works great in most cases. And the fact that you are actively seeking advise from Nick, Piotr etc. is also great, they have some good ideas.

I am looking at Moebius because there’s plenty in it that I like. It’s closer to raw SQL, which I like. I actually plan to write some queries in raw SQL. I don’t mind it, I consider it a feature. When you take any language or DSL that compiles to SQL, you have to compile the queries in your head before you can undersand what they do. It’s extra layer of indirection, and apparently I am not so smart because when I look at say Arel, LINQ, my brain literally hurts.

There’s also plenty I don’t like there. I don’t believe in the whole unstructured data thing, this can be useful but I’ll avoid it. There’s only one worse thing than legacy code: legacy data in your database. If it’s unstructured legacy data… well, you can expect some real fun updating it. There’s nothing to handle database migrations either.

I guess what I am trying to say is that it will never be “one size fits all”. My 2c would be to break Ecto down to micro libraries, so they can be easily replaced with components that fit other people. One library for migrating database, another for querying data, third one for type conversions and validations.

josevalim · May 6, 2016, 7:38pm

We do have a responsibility though to talk about the cases where it does not work and I have failed to do so in the past.

brightball · June 1, 2016, 2:32pm

I agree with your take on breaking things down completely. Specifically, migrations can and should be totally separate from any Objects used to actually interface with the database. I like the Moebius approach a lot because I’m a big believer in all of the features in PostgreSQL and I don’t like an interface layer obstructing that too much.

I’m a big, big fan of the changeset approach in Ecto but probably my absolute biggest pain point with any ORM historically is manually having to map database fields to schemas and rehash what their types are. In ruby the schema_associations and schema_validations gems were a big help in this regard. I’d love to see Ecto just load the schema from the database unless I’ve specifically overridden it somewhere.

It’s all a tradeoff. People who don’t like dealing with databases as much as me probably feel differently.

A completely optimal interface, IMO, would be to have scripts that were able to generate Ecto schemas from different types of definitions, such as database schemas, XSD/WSDL, Thrift, etc. At the point that you can extract schema details that are already defined, cast and pre-validate (field length, int size, enum values, etc) is that point that you truly streamline the development experience by DRYing your schema definition.

michalmuskala · June 2, 2016, 11:14am

Since that discussion originally happened there were some changes in Ecto
2. I talked about most of them in my ElixirConf talk.

The gist is: you can use most (if not all) features of Ecto without any
schemas - inserting, updating, deleting, changesets, types, casting, etc.

While this is a great option, the conveniences for the cases where all you
have is a simple crud and ORM-like approach fits well are maintained. It’s
not an either/or choice - you can have both!

hubertlepicki · June 2, 2016, 11:28am

@michalmuskala you mean what’s referred to “schemaless queries”? Yes, this is good option for us who shiver when they see schema definition, types conversion, format validation, business logic validation and triggers for business logic cramped all into one module

michalmuskala · June 2, 2016, 12:22pm

Yes, schemaless queries are one part.

The other is changesets without schemas -

github.com

elixir-ecto/ecto/blob/master/lib/ecto/changeset.ex#L287-L293


errors: [{atom, error}],
constraints: [constraint],
validations: [{atom, term}],
filters: %{atom => term},
action: action,
types: nil | %{atom => Ecto.Type.t}}

The choice to cramp all of this in one module is entirely yours, though.
Ecto does not force it on you. You are perfectly fine to have one module
for doing the data manipulations (casting, validations, etc - you can split
those as well), other for handling business rules and yet another one for
putting things together and interfacing with data source.
In fact that’s how we’re using Ecto in our application and I’d say it works
quite well. I hope I’ll be able to write a bit more on that soon. That’s
probably the main issue right now - lack of resources for using ecto
differently than in simple CRUD interfaces. I hope the new Ecto 2 ebook
will have some examples as well.

karolsluszniak · June 8, 2016, 2:39pm

@josevalim @michalmuskala You’ve done an amazing work on Ecto v2. I really love the direction you’re going with it. I’ve decided to explore it further, taking the example of model callbacks, and wrote article about it to make one more place for everyone to catch up on what’s coming and to better emphasize how much it means:

Model callbacks in Phoenix, Ecto and Rails

I hope everything I’ve written stays in line with the concept behind Ecto 2 and the work you’re currently doing on it. Please comment in case it doesn’t I definitely plan to explore Phoenix and Ecto further from (my) perspective of a Rails dev.

andre1sk · June 9, 2016, 4:39am

Usefulness of “ORM” is highly specific to a given project. Using Ecto or Moebius or only using stored procedures and simply calling them is subject to your project needs. You can add latest_version to Posts table and populate it via trigger on PostVersions.