I usually use httpoison with floki. There has been some discussion about it
You can replace floki with
For any interaction with a database (postgres by default) you can use ecto.
So my approach would be (roughly) like this:
defmodule Crawler do
def crawl!(url) do
%HTTPoison.Response{body: body, status: 200} = HTTPoison.get!(url)
html = Floki.parse(body)
contents = Floki.find(html, "article") # or whatever you are interested in
# see ecto docs to understand what Repo does
Crawler.Repo.insert!(%Article{contents: contents})
urls =
html
|> Floki.find("a")
|> Enum.map(fn anchor -> Floki.attribute(anchor, "href") end)
# spawn more tasks to crawl other pages, or keep crawling in the current process
end
end