Don't want to make data skipped by Pagination with sorted data

i made an example repo to test this:

i am experiencing a headache while making a cursor_based pagination to sorted data
how do you guys handle this scenario?

i am building a blogging platform service for an exercise
and want to show trending posts ordered_by their total_post_views
since there are too many posts, decided to introduce cursor_based pagination

however, total_post_views will be updated periodically (might be a day)

if the updates happen while any of user is navigating, (it means, query comes after an update)

{
  posts(first: 5, after: "YXJyYXljb25uZWN0aW9uOjQ=", orderBy: {field: "total_post_views", sortOrder: "desc"})  {
  	edges {
      node {
        name
        total_post_views
      }
      cursor
	}
  }
}

then, there is a possibility of skipping(not fetching) some posts (which have changed dramatically in total_post_views)

for example

before_update (order_by desc: total_post_views):
posts = [%{id: 10, total_post_views: 1000}, %{id: 9, total_post_views: 900}, ...,  %{id: 1, total_post_views: 100}]

if i update post1 s total_post_views to 950
after_update (order_by desc: total_post_views):
posts = [%{id: 10, ...}, %{id: 1, total_post_views: 950}, ..., %{id: 2, total_post_views: 200}]

if a user have a cursor numbered 5, in the next pagination, user will have posts = [6, 5, 4, 3, 2]

after a uniq merge in frontend user is now have only [10, 9, 8, 7, 6, 5, 4, 3, 2]
post1 is skipped.

:wave:

How do you want to handle a change in total_post_views?

Do you want to continue iterating over a snapshot of the data that was before the update? That’s what I usually do. But it’s a bit memory intensive since the snapshot is kept in memory either in the database or in your application.

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
|--- 1 page ----|-- 2 page ---|

Or do you want to have an iteration algorithm that would keep looking back for new entries with higher rating than the current cursor which would be included in the next page at the top? Might be a bit more involved. It probably doesn’t need to keep anything in memory, but would require more processing for additional database queries.

[10, 9, 8, 7, 6, 1, 5, 4, 3, 2]
|--- 1 page ----|-- 2 page ---|
1 Like

hey, thanks for your reply :smiley:

actually, i had only one approach in my mind (the former), and got lost to find a solution.
after knowing the second option in your answer, that one is nice also. (so nice)
it is really hard to decide between two.

in my situation, there will be a lot of posts (actually they will be crawled data)
and trending_posts query will have complex filters, and i have subqueries also.
so i am worried about making a snapshot of all of the results.
but still wanna go the first one for UX.

could you please give me more advice about it?

thanks.

So there are two ways that I know of


# this probably won't work

# in the genserver
def init(your_paginated_query) do
  {:ok, Repo.stream(your_paginated_query)}
end

def handle_call({:page_after, cursor}, _from, stream) do
  Repo.transaction fn ->
    # do something with the `stream`
  end
end

Or maybe just create a snapshot with Repo.query at the beginning of the user session and then release it in the end, note that creating too many transactions on a database might significantly hurt its performance due to locks. Not sure if it applies here, though. Or you might run out of the checked out ecto(postgrex) connections/processes to the database if you use long running streams …

Also, I’ve found Using Ecto to run a long-running multi-process transaction, might be somewhat relevant, although I think in this case what’s needed is a long-running single process transaction.


  • a “snapshot” in your application (I used this one for a paginated chatbot’s search feature) which would load some arbitrary number of posts into memory (can be genserver per http session) and serve “pagination” requests from there. For the chatbot I loaded about 10 pages (look at your users’ access patterns, you might need to load more) of content into the genserver which had a data structure like
@type page :: [SearchResult.t]
@type paged_snapshot :: %{
  current_page: page,
  pages_before: [page],
  pages_after: [page]
}

@spec next(paged_snapshot) :: paged_snapshot
defp next(%{pages_after: []} = snapshot), do: snapshot # or request a new snapshot
defp next(%{pages_after: [new_current_page | pages_after], pages_before: pages_before, current_page: old_current_page} = snapshot) do
  %{snapshot | pages_after: pages_after, pages_before: [old_current_page | pages_before], current_page: new_current_page}
end

# similar for prev(snapshot)

as its state and was alive only for the duration of the user’s interaction with it (each interaction prolonged its “life” up to a certain limit). You can also create these snapshots only around the time when the “total_post_views” is about to change.


Another way would be to create materialized views for the posts sorted by total_post_views. And keep two of these views in the database when total_post_views is being updated.

2 Likes

If you were using event sourcing, you could have a custom index that functioned like a materialized view that would continuously keep the client updated wrt the contents it was looking at. I’m not sure how you’d do this within the current Elixir ecosystem, but the concept goes like this:

Your server would continuously listen for changes to your source system. In this case listen for all post views. Then keep a running tally of your top 500 (500 is random, pick a relevant number for your top-X optimized index) posts.

Your client can use javascript + websockets to register interest in a particular range. The server records this interest and publishes updates as they happen to your client. Your client updates its current range view with the updates from the server.

I think this solves the problems you’re having with DB cursor based solutions.

However you now need to:

  • Find a way to get a continuous stream of updates from your source.
  • Bootstrap your initial index
  • Develop a way to continuously process the update stream and update your index
  • Develop a mechanism for asking for the current value range in your index (0-20, 20-40, etc)
  • Develop a mechanism for subscribing to changes within your current interest range (20-40)
  • Develop a mechanism to transmit changes to your client
  • Develop a mechanism for client to dynamically receive updates and update UI

This sounds like a bunch of steps and it is. However the steps are pretty straight forward and the implementation ought to be covered by an ecosystem of libraries. I’m not familiar enough with the Elixir ecosystem to tell you what they are. I’d love for someone else to chime in on the thread if they could provide impl pointers here.

2 Likes

You might be overthinking this. Practically everyone around the net does not mind giving you a next page of articles which might include one or more articles from the previous page – if there have been articles prepended to the top of the feed (and if you move from newest to oldest with every page). It’s slightly annoying but people figure it out very quickly.

The added cost of defending against this looks to be too much effort and potential server resource drain.

2 Likes