Distributed Health Check

Hi guys.

So I got an idea, mostly proof of concept and so I can learn more about distributed applications. It’s basically a master/slave for health checks across regions of deployment.

Each region has an app reporter that receives heart beats from the apps running in the same region, and report back to a master in another region. Each reporter
does not know about each other, only knows its master. I’m aggregating health checks.

All based in HTTP APIs with a schema based in this spec

Is this something valid? Or total bs to implement such system? I’m open to opinions

If it is something you could see yourself using then it is absolutely valid. I believe most people use something separate(saas) to monitor health of applications. The only limitation I see is if you are using distributed erlang stuff like :rpc.multi_call or something similar is this will only really be useful for elixir/erlang applications. I am actually in the process of making a health check type thing for appdoctor.io(nothing there now, still like 4 months off of production testing) and find that doing multi region testing in elixir is really easy/fun with rpc methods. Basically my approach is using rpc.multi_call to make the request from multiple nodes in different regions. I can then report the results all in one “call”.

Circling back if its something you will use then I am sure it is something that others may as well!

The reporter and master would be elixir and probably will be using built-in rpc, the apps will report their health during the heart beat or making the reporter POST against the app so I can keep the apps implementation agnostic. I don’t think I will use it, just doing for fun. Thanks!

Also if you are just concerned with an apps health you can subscribe to nodedown events. Because if you are going to use rpc and a node is down the health check will fail anyways.

http://erlang.org/doc/man/net_kernel.html#monitor_nodes-1

If a node goes down you could fall back to a POST to make sure its really down or just have a spit brain issue.

1 Like

Nice didn’t know about nodedown, I’ll check it to add to the reporters