So, I’ve been toying around with the idea of building application monitoring tools with Elixir. I can’t decide whether using the entire BEAM is worth adding just to manage other processes (non-Elixir processes).
I know there are already some great cloud-based tools for monitoring or process managing but I want to build some pretty custom logic depending on exits and stderr messages, etc.
A great tool that I already use is built with Node.js, called PM2, however, I don’t have the ability to add custom logic and even if I did… if I bug finds it’s way into my custom logic, Node.js isn’t fault-tolerant.
I’m not sure if many problems exist like this, but I am essentially trying to monitor blockchain nodes… Downtime is incredibly expensive because the nodes have to “catch-up” for the period in which new blocks were being created. There’s a whole slew of different custom logic triggers depending on the blockchain software’s behaviour. I hate to bring up blockchain because it’s such a buzzword, but I think that the problem could apply to any other log-based distributed system (monitoring kafka nodes for example).
personally, I’d add a simple /health style checkpoint to your app, expose it to something like collectd using its curl plugin for an up/down type check, and funnel the results into http://riemann.io/ I’ve been using this setup for a long time, across multiple customers, and it’s super flexible, and not overly hard to get your head around. You can set up multiple riemann servers that monitor each other if required, and you can get collectd to send data to multiple riemann instances as well. Cool kids these days would probably go for prometheus, but I’ve not tried that myself.
Maybe my phrasing can be improved. I need the same functionality as an Elixir supervisor, except I need it for non-elixir applications.
Having different callbacks run on failure (depending on the reason) is critical. Having automatic restarts or triggering the process on a different server is also essential. I know that Elixir does a very good job of this for monitoring a distributed cluster of other Elixir applicarions. I’m trying to figure out if it’s overkill to use that functionality on “external” (external to the BEAM) applications.
The beam has a pretty big memory footprint compared to applications like nagios but I need to do more than just monitor.
Haha I’m fairly new to Elixir, but I actually found Port easier to use than Porcelain. It’s quite probable that I just didn’t know what I was doing before getting Port to work. I have a little PoC that works really well, I can kill a node.js process and restart it and the restart works well if the elixir process dies too.
I’m still a little concerned that running the whole BEAM vm is overkill though. For a Node.js process is definitely is as it’s using more memory than my little PoC.