cipher
July 30, 2021, 7:25pm
1
how do I make my code get the number of pages available in advance, and then start several tasks to get several pages concurrently?
Example: example.com/api?id=1; example.com/api?id=2; …
and get the response from the server, until the page returns an empty array: “[]”
Hey cipher welcome! Any good programming task starts by breaking down a problem into simpler parts. The first part is: How to fetch a single page? What have you tried so far?
1 Like
cipher
July 30, 2021, 9:13pm
3
I’ve already requested a single page, organized and separated all the information I want to handle, I only need now to request several pages
The request was to look at the code You have written…
If You know how to fetch a page, You should be able to fetch given a page param. This should give You an offset and a limit.
And count will give You the total page number.
I am also doubtful about the use of Tasks to preload data when a cache could speed up db access.
Fetching data is already concurrent because You have a pool of workers in charge of Repo access.
cipher
July 30, 2021, 10:00pm
5
Here is the code.
I’ll try to solve it by passing a parameter, but from what I’ve seen, it has about 10 thousand pages, would that be the best way?
It was not clear data comes from external API…
I have implemented GenStage pipeline with HTTPoison to get this concurrency.
But my latest web scrap was with Finch to download paintings/metadata concurently from WikiMedia.
1 Like
Ok, what have you tried so far?
There is this old post, I remember I liked it, but maybe now it’s a bit old…
Maybe You can get some inspiration. It uses poolboy and HTTPoison.
Because the problem is not really to be concurrent, but to manage this concurrency.
You don’t want to flood the host with 10’000 concurrent requests.
3 Likes