Some updates on my crawling system in Elixir now that I'm at the 90 day mark of production. (see https://elixirforum.com/t/what-elixir-related-stuff-are-you-doing/113/115?u=adammokan for context)
- The "clocking" mechanism I put in place has had no issues so far. It has actually helped a lot in troubleshooting the live system since I can adjust the clock rate at runtime. This allows me to really watch state changes without stopping the system entirely as well as being able to pause the system in its tracks and inspect a specific process - I think it turned out real well with the way I am handling the processes this way, at least for my use-case where I have a lot of post-processing after scraping and around 40 or so state changes per job.
- I now have 12 distributed nodes running in production and will likely scale out to about 20 by April or May.
- My current numbers as of right now are 20,554,562 crawls with 23,754 data errors - "data errors" represent something that went wrong during the entire crawl/post-processing cycle but does not represent data loss or anything like that. These 23k items still sent a response - just one that was less than ideal. At the end of the day its well under our SLA.
- I don't have hard numbers here, but I think I've only had two legitimate "crashes". One was due to some poor logging I put in a portion of code that caused my message queue to get too large. I paused my system but got impatient and just killed it rather than wait for the logging message queue to burn down. The other time I added some new state transitions, managed to misspell them, and forgot to add a default fall through
handle_info for a scenario without a match. Both of these situations were human error on my part and they happened in the middle of the day when I was working on the system - that is better than 4am on a Sunday.
I'll never really be done with this system as it will continue to grow, but I'm at the point where I sleep at night not worrying about this portion of my tech stack at all. I think that is a plus. I still deal with issues on our old crawlers, but as I migrate them each over to Elixir, life is getting easier for sure.
At the end of the day, my goal was to get to the point where I had stability and resiliency in a world that is tough to achieve this (crawling/scraping). I think I'm moving in the right direction.
Financially, the Elixir system has cut costs for this particular type of crawl job compared to our legacy system in regards to the amount hardware used and throughput achieved. Aside from this the data quality has improved quite a bit. Elixir isn't the real hero on the data quality, but it allowed me to write a lot more code around data validation in many less lines than another language due to things like pattern matching. (I was lazy and writing crappy code before )
Hopefully this inspires someone to give it a go if you find yourself maintaining a legacy system with a lot of moving parts like myself. Not saying Elixir will be a life saver for you, but in my case its been a positive one.
Start small and experiment is my advice.