Pagination: which library? (Phoenix 1.4/Ecto 3)

Hello,

Still new to Phoenix I wonder what’s the best way to do pagination (or even better infinite scrolling).

Can you recommend me a library that works with Phoenix 1.4-RC3 (which uses Ecto 3 if I am correct) and is easy enough to use for a beginner? Is there a de-facto standard library in the elixir world?

I hope you do not mind such questions.

Thanks!

3 Likes

@Nefcairon My favorite way is to use Repo.stream/2 in Websocket connection (see Channels guide). Since you decide to use WebSocket you will have 2-way communication which means you can query anything and return results one by one in really small and fast messages whenever you want to. This solution is also really well scalable. No matter if you have 100 or 100_000_000 entries returned from query you still only fetch and sends exactly same count of entries at a time (just only number of server messages is changed).

If you want to paginate query you can use ecto's Query API, but before you decide to use Ecto.Query.limit/3 make sure you have read why it’s considered as bad practice in some cases:

For more information please see What is the best approach for fetching large amount of records from postgresql with ecto topic.

11 Likes

For pagination where I need to show page numbers I usually use scrivener_ecto. For infinite scrolling I use https://github.com/duffelhq/paginator.

9 Likes

Could you please explain this approach in further details for 100 million records?

Number does not matter since you are using Stream.

Here we have few assumptions:

  1. WebSocket (or other bi-directional communication way) is required as well as Phoenix channel API (or equivalent)
  2. Since we are using Channel we can assign any data for each WebSocket connection
  3. Since we are using Ecto.Repo.stream/2 we have only one database row at a time in memory.
  4. Each WebSocket connection have it’s own process

From this we can create scenario like:

  1. Client request GET /posts

  2. Server sends basic HTML DOM with a JavaScript code

  3. /posts is list so we are automatically appending table element

  4. Client is requesting all Post from database

  5. Server is generating per-client query id (for simplicity let’s say it’s counter) - example return: {"id": "query1", "status": "OK"} (which means that query with id 1 is valid and therefore will be send).

  6. Client is setting to table property id with value query1 and property data-model with value post

  7. Server is calling Ecto.Repo.stream/2 in background and in Stream.each/2 it’s sending data like: {data: {…}, "id": "query1", "type": "append"}.

    Note: You can optionally use Stream.chunk_every/2 before Stream.each/2 if you want to send more than one database row at a time.

  8. Client receives such response and it’s creating tr DOM element for it with id: query1-#{resonse.data.id}

  9. If there is created, deleted or updated any Post then server could send something like: {data: {…}, "model": "post", type: "delete|update"}

  10. Client accepts such event and it’s looking for table[data-model="post"] > tbody > tr[id^="query"][id$="-#{response.data.id}"].

With that we are sending always same amount of data for each client request at a time. Later it’s possible to limit number of requests per client (using Channel assigns).

4 Likes

Thank you for such a clear answer.

I see that you’re mixing pagination with live updates in order to provide a single convenient interface for a client via Channels. This approach and transport are great, but I’m wondering whether Repo.stream/2 is suited for pagination?
If I got it right, Repo.stream/2 uses database’s internal cursors to efficiently select millions of records one by one or in chunks, but at the price of taking up a connection from the pool till it’s done. So the more records and clients you have, the faster you run out of available connections. Since clients don’t need all those millions of records, you have to apply a limit with Enum.take/2 and keep the last record’s unique identifier AKA external cursor (typically a combination of ID + timestamp) on the client side or in the Channel’s state for the next page request.
A plain query with Query.limit/3 is simpler and cheaper than Repo.stream/2 for this case, isn’t it?

2 Likes

It’s why I said it’s example scenario. Instead of sending one row right after another you can request 30 rows at a time and wait until client sends request like: {"id": "query1", "type": "next", "value": 30} and then you just need to send back next 30 rows etc. There is lots of scenarios depends on use case which could be created using such API.