brainsteam.co.uk/brainsteam/content/posts/2022/01/22-01-analytics/index.md

4.6 KiB

date description draft post_meta preview resources tags title type url
2022-01-22 13:02:31+00:00 Replacing Google Analytics with something more open false
date
/social/d99987a03bb6db8f237c2156c8f057990d689c18b18a37b708d8754c268a27c7.png
name src
feature images/feature.jpg
meta
open-source
privacy
100DaysToOffload
Privacy Respecting Analytics posts /2022/01/22/privacy-respecting-analytics

This is a somewhat cliched meta post about blogging and my 4th post in this year's run at the #100DaysToOffload challenge. See the full series here

Google's Analytics programme made the headlines this week after Austria ruled that the way that it works constitutes a breach of GDPR. I've been using Google Analytics on my site for a little while (mainly out of laziness since the Hugo template I use has built in support for Google Analytics if you paste your user ID number into the configuration). I'd been thinking about swapping to a self-hosted analytics package for a while but this decision was the final prompt that made me do it.

I've seen lots of noise from the tech community on this subject this week including plenty of posts about it on hackernews and a few open source alternatives were mentioned. I wanted something easy to use and lightweight to run so that I don't slow down or bloat out my site.

Do I even need client-side analytics?

Google analytics and many other analytics packages work by including a script in your webiste that your visitors' browsers run. This means that visitors may not be tracked if they disable javascript or use an adblocking plugin or an old browser. Also it puts the burden on the user to download and run extra code which is a little unfair if the script is massive and bloated.

The old fashioned way of doing it which has been around for donkeys years is server-side log analytics packages like Awstats. Basically these sorts of packages read your server logs and scan them for information about who visited, what browser they were using and so on. This means no additional bandwidth or compute is required of the visitor and their visit is logged regardless of whether they have JS turned off (although this might sound creepy, the only real info we get comes from their Browser's User-Agent header which a really paranoid visitor could manipulate with a browser plugin like this one ).

As far as I can tell the main advantage of client-side packages versus server-side is that you get slightly more granular information about which visitors are humans versus bots and some other interesting stuff. There's a really good blog post about this from one of the client-side providers here (although they obviously have a vested interest in showing off how brilliant client-side packages can be).

Choosing a package

There are a few options around, I've tried Matomo nee Piwik before but found it to be pretty heavy weight. A few people recommended plausible but I read through this blog post and in the end I opted to run umami like the author did. I decided to run it in docker using [their docker-compose configuration] (https://github.com/mikecao/umami/blob/master/docker-compose.yml). It took me about 5 minutes to stand up and add a new subdomain to my caddy server. It's using about 100-120MB of RAM plus some overhead for the MySQL server. Personally I feel like this is a really good use case for sqlite but support for it is not yet merged into umami.

Final Thoughts

For something quite simple, there are quite a lot of options and things to think about when it comes to analytics. It's easy to get analysis paralysis. For now, I'm sticking with running Umami and I've also got Awstats running on my shared hosting account so I might do a comparison of my numbers in a few months time and see how they compare.