Kai Ito
Privacy respecting analytics using GoAccess

Privacy respecting analytics using GoAccess

Since I bought my own domain, and rent a server to host this blog, I've been meaning to add some kind of analytics. But in the interest of user privacy, and keeping the amount of data loaded by the client to an absolute minimum, I kept putting off adding a well-known analytics solution. I knew from the beginning that I, under no circumstances, want to add Google Analytics. However, I've been wondering about Matomo. The big advantage to Matomo is that it's self hostable and open source, so you can be sure that all data collected is exactly where it should be, and not being sold to further 3rd parties and advertisers. But Matomo is still an extra client side library that needs to be sent to the client, and that goes against my mission of keeping this site to an absolute minimum size.

Discovering GoAccess

After researching alternatives to Google Analytics and Matomo, I eventually stumbled across GoAccess. What makes GoAccess interesting, is that it generates relatively detailed analytics based purely off access logs from a web server, such as Apache or in my case: Traefik. It's written in C, and features both a terminal interface, as well as a web interface. The way it's designed to be used is by piping the access.log contents into the GoAccess binary and providing any number of switches to customize the output. Switches such as which log format you're sending it, as well as how to parse Geolocation from IP addresses.

Initial thoughts

What I was most interested in, though, was to be able to view the HTML report remotely as if it were a live site that continuously receives data. GoAccess does support this feature, in the way of offering a Web Socket connection that sends new entries in the access log to the client. This turned out to be quite difficult to configure correctly, as most of the documentation and tutorials are focused on one time parsing. Reading through the documentation, I found mention of enabling real time HTML, as well as setting a web socket URL. But it was exceptionally difficult finding any information on how to actually set the correct combination of switches to keep a running service which would allow displaying this HTML report with live updates. I had the additional complexity of wanting to host it within a Docker container behind my reverse proxy. There does exist an official docker image for GoAccess here which I pulled. But even the documentation for the docker container doesn't make it clear how to run it in a way that I wanted. The biggest shock was when I looked at the Dockerfile for the image and discovered that the CMD command simply invoked the --help output. After reading more of the documentation, I came to learn that even with the real time HTML report and web socket URL, GoAccess doesn't do anything to serve the HTML file over any kind of server. There also exists a --daemon switch, but I couldn't discern at all what this was supposed to accomplish. My docker container always simply exited immediately.

GoAccess Setup

After reading through lots GitHub issues and other blog posts, I finally figured out the correct configuration of GoAccess to be able to run it in a docker container and serve a real time report. Configuration can be done via command line arguments, or via a goaccess.conf file. In my particular setup, I opted for a combination of the two. Within the docker-compose file where I setup the container, I pass in two arguments namely --no-global-config and --config-file=/goaccess/data/goaccess.conf to override the container's default --help command, and to tell it where to find my configuration file. Secondly, my configuration file is mounted via a docker volume, and looks like this:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
log-format COMBINED
log-file /logs/access.log
output /goaccess/report/index.html
anonymize-ip true
real-time-html true
ws-url wss://www.example.com:443/ws
origin https://www.example.com
geoip-database /goaccess/geoIp/GeoLite2-City.mmdb

The two most important values in this configuration file are the real-time-html and the ws-url values. It was the combination of these two values that finally started GoAccess as a running service in docker that continuously monitored the access.log file for additions. Unfortunately, it only updates a single HTML file and makes no attempt to serve it over an HTTP connection. So you will need another web server to actually serve this HTML report. Nginx is perfect for this use case, as it's also one of the most popular reverse proxies, and can serve static assets as well. The awkward part for my setup, is that I'm using Traefik as my reverse proxy, and it can't serve static assets by design. Another issue I ran into, was that according to the documentation of Traefik, access logs are written in the Common Log Format which GoAccess understands. However when I first parsed the access logs with the GoAccess log-format set to COMMON, various graphs didn't properly parse. Turns out that GoAccess actually want the Apache COMBINED log format, which in this particular case is the same as the Traefik logs. Lastly, I wanted to get Geo Location working. This requires a free GeoIP database from MaxMind in the .mmdb format. Fortunately, the GoAccess docker image is already built with support for parsing this database, and you simply need to supply your own database file. After that, it was simply a matter of configuring Nginx to passthrough the websocket url, and have Traefik point a certain route to this Nginx container.

Nginx Setup

The Nginx setup is nothing special, and should be simple to understand for anyone that has ever configured Nginx before:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
server {
    listen 80;

    location / {
        root /data/www/goaccess;
    }

    location /ws {
        proxy_pass http://GOACCESS_IP_ADDRESS:7890/ws;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Connection "keep-alive";
        proxy_pass_request_headers on;
        proxy_http_version 1.1;
    }
}

The root location referse to where the HTML file is located, which in my case shared via a docker volume mount. And the /ws route is the passthrough to the GoAccess web socket server which delivers live updates to the website. You might notice that there is no configuration for SSL certificates here. That's because my Traefik container is acting as the public facing reverse proxy, and the TLS termination happens there.

A better idea in general might be to create a new docker image which contains both GoAccess as well as a correctly configured Nginx, and it seems several people have already tried this. I haven't yet experimented with these containers, but most likely will try to get one of those working as at this point this double container business seems really annoying and inefficient.