ElixirConf 2019: Crawling The Web With Elixir - Adam Mokan

by @adammokan

Crawling the web is something that a large number of people do, but few people really want to talk about. I feel like there is not enough knowledge sharing on this topic, and I want to share my experiences over the past decade crawling at scale.

We will look at how I used Elixir to orchestrate a pool of distributed, dynamic headless crawler nodes and go over the things I got wrong, how I resolved them, and more.

Even if you have no interest in crawling the web, I’ve learned over the years that knowledge of how to crawl the web in a resilient manner shares a number of overlapping similarities to large-scale data integration with 3rd party APIs.

General awareness of OTP, GenStage, distributed systems, headless browser APIs, and Amazon Web Services are a plus.

View all threads tagged with distributed-systems
View all elixirconf2019 talks

7 Likes

Thank you @axelson - I appreciate the plug!

And I know the talk itself is light on super technical parts (that was intentional due to the subject matter and my circumstances). But if anyone has specific questions I may be able to answer more - feel free to reach out.

4 Likes