Task for requests HTTP

hello, i have the following problem:

I need to make http requests for a given site with thousands of pages. just stop the requests when the response returns {_page,[]}

I was able to make the code without using concurrency, and Tesla for requests

def fetch(page) do
   {:ok, response} = get("/api/res", query: [page: page])
   {page, response.body[“res”]}
end

def fetchAll(curr \\ 1 , list \\ []) do
  result = fetch(curr)
  case result do
   {_page, []} →
      list
   {curr, nil} →
      fetchAll(curr,list)
   {_page, res} →
       curr = curr + 1
       fetchAll(curr,list++res)
  end
end

however I wanted to use Task to increase speed, but i don’t know how to do it.

if the number of pages was known, it would be easy to implement it using Task.async_stream

1…numberPages
|> Task.async_stream(fn page ->fetch(page) end, ordered: false,
max_concurrency: System.schedulers_online()*3)
|> Enum.reduce([], fn {:ok, {_page, res}}, j → res++ j end)

but as the number of pages depends on the return of the request, I don’t know how to do this

You can use Stream.reduce_while and halt when you receive {_page,[]}. Doesn’t seem like Stream.reduce_while exists. Maybe Task.async_stream and followed by Stream.take_while/2 would work.

You can use send requests in chunks and stop when you have {_page,[]} meaning you’d “overshoot” and make requests to the pages that don’t exist by that’s probably not a big problem.

2 Likes

You could try to find the last cursor first with a binary search: try page 1 then multiply the cursor by two until it returns []: 1, 2, 4, 8, 16, 34, 64, etc… For instance you know that cursor 512 returned pages but 1024 returned []. Then you try in the middle: 768. If it returns [], the last page is between 512 and 768, if not the last page is between 768 and 1024. And you continue by trying in the new middle until you find the last cursor.

Then you can fetch all your pages from 1 to last cursor in an async stream.