What's the best way to repeatedly and concurrently hit a website for scraping purposes?

You can do it with Task

https://hexdocs.pm/elixir/Task.html

I use this with poolboy to avoid ddos on external server.

eg:

mix.exs

[
  {:poolboy, "~> 1.5"},
  {:httpoison, "~> 0.11"},
  {:floki, "~> 0.14"}
]

Sample code

@genserver_call_timeout 1_000_000
@task_async_timeout 1_000_000

tasks = Enum.map(list, fn({link, filename} = _tupple) →
Task.async(fn → :poolboy.transaction(:worker,
&(GenServer.call(&1, {:download, link, filename}, @genserver_call_timeout)), @task_async_timeout)
end)
end)
result = Enum.map(tasks, fn(task) → Task.await(task, @task_async_timeout) end)

You need to make your own worker… and you need to complete the code…

2 Likes