I’ve been building my own personal project https://morphic.pro and I’m at a place where I want to start collecting page hits and other metrics about where the request came from and so forth. So its time to add some analytics to my personal site.
I’ve used Google Analytics many times in the past, and overall it’s a wonderful tool. That said I feel we are living in a time where I trust Google far less than I used to. I’m not going to go full tinfoil hat right now my, but my intuition says I should avoid google.
I’ve also used Mixpanel and again a wonderful tool, but for the same reasons I don’t feel compelled to share my data with Google is the same I feel about sharing my data with anyone.
So that leads me to constantly think, why do I want to outsource this solution? I mean elixir/phoenix is more than capable of recording these requests in a way I can build a nice dashboard for.
And since I think it would be fun and a good showcase of my skill sets I keep going back to this idea.
So before I just jump in and start writing code I wanted to see what other people’s experiences have been when dealing with analytics in their own projects. Did you try your own homegrown solution only to find you went back to google anyways? Did you hit any performance walls that just didn’t make it feasible to do your own thing?
“We don’t collect personal or invasive data about your users, nor do we use cookies, meaning you don’t have to show pesky notices to users about cookie tracking. We’re GDPR compliant too.”
Nice, this is really at the core of why I don’t want to us many of the big box solutions. I will have to give this a look!
While I love their motto, their price tag is a little steep for a personal project. At $14/m for their cheapest its 3x that of my hosting cost at the moment. Its a shame they don’t offer something a little more affordable.
Starting to look at Plug.Conn.req_headers and realizing that everything I care about is already in that.
My thought at this point is to make a plug that just async’s a request into postgres. Maybe I use mnesia as a buffer and create some worker that migrates the records to postgres.
But be aware that while everything might be there doesn’t necessarily mean you’re allowed to store it indefinitely like that. E.g. IP addesses are personal data by the gdpr. Given a timestamp and the IP there are means to linking it back to a telephone line and therefore likely a person, even if those means are likely not available to yourself.
All I’m interested are in the basics about the agent, the sec-fetch-user, the referer, and the request_path.
Maybe I use the IP to look up some geo and then I drop the IP on the floor. That’s less than what plausible is doing. I think I will be ok with GDPR. Still thanks for the heads up. I don’t really even care about bounce rate.
Thanks for mentioning Plausible @LostKobrakai. I’ve been working on it for about a year now.
I’ve thought a lot about offering a self-hosted solution. To be clear, the license allows you to self-host and I’m not stopping anyone. When I say that plausible does not offer a self-hosted solution, it means that I have not taken up the responsibility for making it easy to install and upgrade. If you want to grab the code and run it, by all means go ahead.
I’m wary of offering a self-hosted solution because:
a) The product is still early stage and it would take away from my time focusing on new features
b) It’s likely that the infrastructure will evolve to a point where self-hosting is just not realistic
For instance, I’m currently moving from Postgres to Clickhouse as the database so that I can accomodate sites with >10m pageviews per month. In the future, it might be necessary to add Kafka in the data pipeline to better handle big traffic spikes etc.
There’s a tension between making the product easy to self-host vs making it cost-efficient for large customers and a free tier. For example, Matomo uses a standard MySQL database and it’s very easy to self-host. On the other hand, their hosted service is crazy expensive. With 10m monthly pageviews you’re looking at €849 per month, and that only includes 6 months data retention.
If they built a more streamlined data pipeline, it would make their product much harder to self-host but it would bring the cost of the hosted service down.
Hope that clarifies things a bit. I’m not ruling out a self-hosted version in the future, potentially with different database and ingestion adapters to simplify the infra requirements for self-hosting. However, at the moment I’m focusing on the core business and especially scaling up to larger customers.