Dear Elixir community,
After a year of development, bug fixes, and improvements, we are proudly ready to share the release of Crawly 0.10.0 here with you.
We have dedicated a lot of time and knowledge to build a fast and feature-rich web scraping framework. To be absolutely honest, I have to say that I took a lot of ideas from other popular web scraping framework (Scrapy, python), as I have previously worked with the Scrapy core team.
We have some reported production usages of Crawly (some of them with really long-running crawls), however, we still have to approach the stable version (hopefully it will happen in a couple of releases).
To describe how Crawly is different from other known elixir scraping frameworks, I will list crawly’s features which I believe make it outstanding:
- Documentation - we have spent an enormous amount of time and effort to build great and clear versioned documentation!
- Rate limiting
- Robots.txt support
- Requests and Items validators
- Automatic duplications filtering
- Automatic cookies management (allows to bypass login pages and cookie-based regional filtering)
- Browser rendering (with the help of Splash)
- Retries support
- Proxies support
- HTTP API
- Visual jobs management dashboard which allows operating multiple Crawly nodes at the same time (experimental): see it deployed on demo ec2 micro instance: http://18.104.22.168/
We hope it will be useful for you!
If you have a suggestion or a production use case you’re happy for us to share, please get in touch.