Fetch data from multiple pages

how do I make my code get the number of pages available in advance, and then start several tasks to get several pages concurrently?

Example: example.com/api?id=1; example.com/api?id=2;
and get the response from the server, until the page returns an empty array: “[]”

Hey cipher welcome! Any good programming task starts by breaking down a problem into simpler parts. The first part is: How to fetch a single page? What have you tried so far?

1 Like

I’ve already requested a single page, organized and separated all the information I want to handle, I only need now to request several pages

The request was to look at the code You have written…

If You know how to fetch a page, You should be able to fetch given a page param. This should give You an offset and a limit.

And count will give You the total page number.

I am also doubtful about the use of Tasks to preload data when a cache could speed up db access.

Fetching data is already concurrent because You have a pool of workers in charge of Repo access.

Here is the code.

I’ll try to solve it by passing a parameter, but from what I’ve seen, it has about 10 thousand pages, would that be the best way?

It was not clear data comes from external API…

I have implemented GenStage pipeline with HTTPoison to get this concurrency.

But my latest web scrap was with Finch to download paintings/metadata concurently from WikiMedia.

1 Like

Ok, what have you tried so far?

There is this old post, I remember I liked it, but maybe now it’s a bit old…

Maybe You can get some inspiration. It uses poolboy and HTTPoison.

Because the problem is not really to be concurrent, but to manage this concurrency.

You don’t want to flood the host with 10’000 concurrent requests.

3 Likes