I think it’s quite common to send request metrics to a time series database like Prometheus or InfluxDB. In Elixir, there is the handy
prometheus_plugs package that collects requests and stores them in the
prometheus_ex registry from where you can expose it to e.g.
We got a little Elixir service at work that, for the most time, ran only on an IP and had no own subdomain. It also ran on a non-default port (
7000 in this case). Yet, I got a ton of edgy crawler requests who try to access
.env files, gitconfigs, WordPress logins, Bitcoin wallets, you name it. (I can make the list of requested paths public if someone is interested)
Since the default behavior of
prometheus_plugs, and I guess this is also true for every other implementation in other languages, is to just expose metrics (counter + histogram) for all these 404 requests, you end up with a ton of time series. My Prometheus keeps data for only 7 days and right now I have time series for 108 paths with a 404 response.
Just yesterday, I executed the following request in Grafana for a time span of only 1 day and it froze the entire VPS.
The solution is to exclude requests with a 404 status.
But this is a good example of how easily someone can DoS your system in a way you wouldn’t guess right away.
A quick Google search didn’t result in anything useful. It seems like nobody thinks or at least talks about this. There is general advice to not put highly dynamic values into time series labels/ tags, e.g. user id’s. But storing request metrics, including the path, is a very common use case.
I also cannot think of any sane solutions. Just not storing metrics for 404 (or even the whole 4xx range) requests seems wrong as these are still very useful information.
I am interested in the thoughts of the community about this. This is not an Elixir specific topic but affects every system in any language.