Crawler - easy web crawling / scraping powered by GenStage

IRLeif · September 2, 2018, 5:27pm

Thank you, @fredwu, for creating this library and for writing clear documentation. The high-level architecture diagram was particularly useful for a beginner like me to get a good overview.

I’m planning on trying out your library for my first real Elixir project.

One thing I’m wondering about is whether it would be feasible to assign different IP addresses to each crawler via some kind of proxy or VPN service, such as Tor, TorGuard or NordVPN.

When scraping, I want to respect each site’s robots.txt. I’m also thinking about using SchedEx to trigger scraping at night-time (low-traffic hours), to be mindful of the target sites’ performance.