Failing to use `union` and also preload or select associated data

So I am trying to write a latest_edited_or_published_by_author/2 function that gets the n most recently updated stories or story drafts. I have a query for the stories and query for the story drafts that I concatenate with union.

  def latest_edited_or_published_by_author(%User{} = user, amount \\ 8) do
    story_query =
      from s in Story,
        join: u in assoc(s, :authors),
        where: u.id == ^user.id,
        select: %{id: s.id, title: s.title, draft?: false}

    story_draft_query =
      from s in StoryDraft,
        join: u in assoc(s, :authors),
        where: u.id == ^user.id,
        select: %{id: s.id, title: s.title, draft?: true}

    union_query =
      from q in story_query,
        union: ^story_draft_query,
        order_by: fragment("updated_at ASC"),
        limit: (^amount)

    union_query
    |> Repo.all()
  end

Here is an example of what the function returns:

[
  %{
    draft?: true,
    id: "...",
    title: "Title"
  },
  %{
    draft?: false,
    id: "...",
    title: "Title"
  },
  %{
    draft?: false,
    id: "...",
    title: "Title"
  },
  %{
    draft?: false,
    id: "...",
    title: "Title"
  }
]

All good so far, but there is a catch. I also need to return a list of author data for each story/story draft. There is a many-to-many relation between authors and stories, as well as between authors and story drafts. In both cases a join table is used to establish the many-to-many relation.

Here is an example of what I need to get:

[
  %{
    draft?: true,
    id: "...",
    title: "Title",
    authors: [%{id: ..., name: ...}, %{id: ..., name: ...}]
  },
  %{
    draft?: false,
    id: "...",
    title: "Title",
    authors: [%{id: ..., name: ...}, %{id: ..., name: ...}]
  },
  %{
    draft?: false,
    id: "...",
    title: "Title",
    authors: [%{id: ..., name: ...}, %{id: ..., name: ...}]
  },
  %{
    draft?: false,
    id: "...",
    title: "Title",
    authors: [%{id: ..., name: ...}, %{id: ..., name: ...}]
  }
]

Questions:

  1. How do I add the author data?
  2. I am also interested to know whether you generally advice returning a list of structs from the database, or a list of maps (or some other non-struct data) with the data you’ll use. I am not sure I am foreseeing all the consequences of doing one over the other.

Failed attempts:

  1. Using fragment(“array_agg(?)”, …) in combination with joins to piece together the list of authors. Also tried json_agg.
  2. Returning structs from the union anyway, so I can use Ecto.preload. Union, however, only returned one type of struct (either all Story or all StoryDraft).

Edit: The Story and StoryDraft schemas have many fields in common, but also necessarily have unique fields.

2 Likes

Correction. This should have been: ... order_by: fragment("updated_at DESC"), ...

I got most of the way there with.

def latest_edited_or_published_by_author(%User{} = user, amount \\ 8) do
  story_query =
    from s in Story,
      join: a in assoc(s, :authors),
      where: a.id == ^user.id,
      group_by: s.id,
      select: %{
        id: s.id,
        title: s.title,
        draft?: false,
        authors: fragment(
          "array_agg(jsonb_build_object('id', ?, 'username', ?, 'verified', ?))",
          a.id,
          a.username,
          a.verified
        )
      }

  story_draft_query =
    from sd in StoryDraft,
      join: a in assoc(sd, :authors),
      where: a.id == ^user.id,
      group_by: sd.id,
      select: %{
        id: sd.id,
        title: sd.title,
        draft?: true,
        authors: fragment(
          "array_agg(jsonb_build_object('id', ?, 'username', ?, 'verified', ?))",
          a.id,
          a.username,
          a.verified
        )
      }

  union_query =
    from q in story_query,
      union: ^story_draft_query,
      order_by: fragment("updated_at DESC"),
      limit: ^amount

  Repo.all(union_query)

end

This successfully puts author data in a list of maps, like is intended. The maps use strings as keys, not atoms, which is not ideal. But I believe I might be able to fix that somehow with type/2 from Ecto. Any ideas are very welcome, of course :slight_smile:.

Or this :stuck_out_tongue: .

I definitely prefer returning structs over maps.

Can you explain your decision to separate stories from drafts at the database level (assuming you’ve done so)? Since you’re obviously needing to mix the two together in some circumstances, that seems to me to be a good indication that things would be much easier if they were unified — I would likely use a nil “published at” timestamp to indicate “draft” state.

2 Likes

I initially wanted to store stories and story drafts in one database table. Using a draft_content field (as I have seen suggested in resources) did not make sense to me for my use case, compared to the alternatives.

Having a published_at field in the table seemed a nice option. But the user needs to be able to draft in parallel with having a story published. So, imagine first drafting a story, than publishing it, than drafting more, while other users can still read the published version of your story. Also, I wanted to make use of an Ecto one to one relationship between story and story draft. Using seperate tables in that case seemed a good fit.

1 Like

Because the benefits of having a consistent data structure are greater than the disadvantage of sending more data from back to frontend? Minimizing data footprint was my main reason for not returning structs, but I have this uneasy suspicion that I am lacking the insight to make a better founded decision :face_with_peeking_eye:.

Oh, I see. To me that sounds like versioned content, and I would lean into that. There are a number of ways to model that, but something like stories would have a “published at” field and a “current version ID”, story versions table would have a “story ID” field

1 Like

Very much the former. I would need to prove that my application saw noticeable benefit before I worried about the latter.

I’m all for the story/draft separation and do it myself.

One of the reasons is certainly as @coen.bakker mentioned which is to be able to edit a draft of a currently published story. @jswanner: amen to versioned content! It does add complexity that you might not need, though, and I’d argue it goes beyond that. A published story has attributes/associations that a draft does not. Comments and likes are the main ones. If we separate these entities then we can avoid all the conditionals that creep into various functions. With one entity that might be published we always have to ask if it is. With two entities this goes away and I feel that is more powerful than it may seem.

On the other topic, I’ve been following along with these posts and didn’t have quite the wherewithal to respond this weekend, but I absolutely agree with @jswanner that you should just return structs unless you have a really good reason not to. Personal, I would not say that minimizing data over the wire is probably not a good reason—while I really sympathize with the sentiment, I think it’s almost certainly negligible in this scenario.

This definitely ties into other discussions going on about the difficulty of learning Phoenix, but if you really want to have have smaller slices of certain tables, then create new structs in different contexts that have a subset of the fields you need. You can also create table views at the db level but that is taking it even further.

Can you clarify the exact conditions you want to meet a little further please?
For example the structure seems to be:

Create new “Story” using many-to-many with users stored as a list?
Create new “StoryDraft” that stores the story_id and can be updated by any of the users? Presumably also stores the users as a list.
The above have independent updated_at values as they are in different tables

When you return the results, you are searching through the list of users, and only returning the latest entries for for both tables, regardless of which user last modified them.

Is this correct?

If so the most difficult part about this for me seems to be dealing with the list of authors which I assume are “users” with user id values. I don’t really touch joins or fragments that often, but I would try something along these lines

def recently_updated(user, limit \\ %{}) do
  query =
    from s in Story,
    join: sd in StoryDraft,
    on: fragment("? = ANY(?)", ^user.id, sd.users),
    where: s.id == ss.id,
    order_by: [desc: s.updated_at],
    limit: ^limit,
    distinct: true,
    preload: [:users]

  Repo.all(query)
end

Again, this will probably not work as is, and was written assuming you store the “authors” in a list as “users”

Also I changed the order_by to just normal elixir instead of a fragment and added distinct to stop you getting duplicate results if a story and draft are both within the limit. Not sure if its needed or not.

    order_by: [desc: s.updated_at],

If its a field in the database and a straight forward asc or desc you don’t need to use a fragement and can just write it like this

I added a preload of users under the assumption you have a field name “users” storing the user ID’s as a list. I don’t normally do this, so not sure if it works. Normally :user would preload a single user.

Hopefully something helps.

1 Like

Here is some more context.

The idea is that a user (i.e., author) can create story drafts. Later those can be published. Publishing deletes the StoryDraft and creates a Story. When published, the users can make edits to the published story. In that case a new StoryDraft will be created by copying a large subset of the fields of the published story. If the user commits to the edits made in the new StoryDraft, the published story gets updated and the StoryDraft gets deleted.

stories schema

  schema "stories" do
    field :title, :string, default: "Untitled Story"
    ...

    has_one :story_draft, StoryDraft

    many_to_many(:authors, User,
      join_through: "authors_stories",
      join_keys: [story_id: :id, author_id: :id],
      on_replace: :delete
    )

    field :published_at, :utc_datetime
    timestamps()
  end

story_draft schema

  schema "story_drafts" do
    field :title, :string, default: "Untitled Story"
    ...

    belongs_to :story, Story
    many_to_many(:authors, User,
      join_through: "authors_story_drafts",
      join_keys: [story_draft_id: :id, author_id: :id],
      on_replace: :delete
    )

    timestamps()
  end

I am making changes to my schemas, however, because some improvements are necessary. But this is the status quo.

The code snippet from my original post returned a list of maps. I have now changed this to the database returning a list of structs.

  def last_updated_stories_and_drafts_by_author(%User{} = user, amount \\ 8) do
    story_query =
      from s in Story,
        join: u in assoc(s, :authors),
        where: u.id == ^user.id,
        select: [
          :id,
          :title,
          :updated_at,
          :cover_image,
          :published_at
        ]

    story_draft_query =
      from sd in StoryDraft,
        join: u in assoc(sd, :authors),
        where: u.id == ^user.id,
        select: %Story{
          id: sd.id,
          title: sd.title,
          updated_at: sd.updated_at,
          cover_image: sd.cover_image,
          published_at: nil
        }

    authors_query =
      from a in User,
        select: [:id, :username, :verified]

    union_query =
      from q in story_query,
        union: ^story_draft_query,
        order_by: fragment("updated_at DESC"),
        limit: ^amount,
        preload: [authors: ^authors_query]

    Repo.all(union_query)
  end

As you can see, I used %Story{} as the return struct in the story_draft_query. That works, but feels a bit hacky. The Story struct is a superset, let’s say, of StoryDraft. But that is not enforced in any way (e.g., a marco).

I believe I ran into an error when combining union with Ecto’s order_by. That’s why I used the fragment to order.

This is where I got more information about that error: Ecto 3: union/union_all + order_by gives "(undefined_table)" Postgres error · Issue #2825 · elixir-ecto/ecto · GitHub.

That also means I cannot pattern match on the struct (i.e., inside a HEEx markup of a LiveView) to determine whether a particular item is a story or a story draft. That was my initial strategy. So something like:

<%= for s in @latest_updated %>
  <StoryComponent :if={s.__struct__ == MyApp.Story}/>
  <StoryDraftComponent :if={s.__struct__ == MyApp.StoryDraft}/>
<% end %>

That seemed like a nice approach to me. But now I have to use one of the fields of the structs to determine whether an item is a story or a story draft. For example, with if-statements like so:

<%= for s in @latest_updated %>
  <% if s.type == :story do %>
    <StoryComponent/>
  <% if s.type == :draft %>
    <StoryDraftComponent/>
  <% end %>
<% end %>

From what I understand a Ecto union that returns a list of structs will returns only one struct type. So this:

   union_query =
      from q in story_query,
        union: ^story_draft_query,
        order_by: fragment("updated_at DESC"),
        limit: ^amount,
        preload: [authors: ^authors_query]

Returns a list of %Story{} (just like the story_query), even if the story_draft_query would return some other struct.

You can use the same :if syntax with the condition on the fields: <StoryComponent :if={s.type == :story} />

For sure! I gladly make use of that normally. :slight_smile: