The Privacy control point: unbundling Google Analytics

There's big problems with Google Analytics. It's also very convenient. Let's look for the privacy control point.

Pim Nauts, Founder

Reading time: about N minutes.

There’s a lot of buzz surrounding Google Analytics lately, with many a (European) Data Privacy Authority bullish on its usage. Core issue: Analytics feeds into a vast network of Google services with a real risk of personal identification regardless of consent or settings.

So we have an entire market (vendors and organisations alike) chasing the silver bullet that will give Google Analytics the checkmark again ✅. We get a lot of questions about it (“you guys do privacy and data, right?”). And while STRM is closely related to the data dimension, we do something more specific compared to the package that tools like Analytics offer. Which might be a great fit if you have deep data AND privacy needs, provided you know which function it can replace.

We also believe waiting for the silver bullet may turn out to be expensive time lost. The reason: Analytics as a product is a bundle of value functions. Privacy - and so the control point you should look at- is in the base layer that the other value functions build upon. And that is exactly what you shouldn’t expect from Google to give up on.

All-in, this might be an important opportunity to take control for reasons beyond privacy pressure and reduce strategic lock-in.

That does require a change of perspective.

Lend a hand

The issue with external pressures like these is the messaging is often only focused on what you can’t do. We have been vocal about offering concrete guidance to the market instead of just a bunch of njets. We feel that’s missing from the conversation right now.

We wanted to offer some additional understanding and guidance (albeit high-level) by dissecting if there’s a setup that allows you to drive more value with lower dependence and better compliance.

The good news: there is. The bad news is it comes with a lot of work.

But first, the question of why the entire world is on Google Analytics anyway.

Why Google Analytics won web analytics

When the web was still mostly pure html and css (the code languages to create and style documents on the internet), owners of websites discovered it was interesting to see how many traffic their sites received. Parsing the log files of the servers that hosted websites was a great way to get this data. A log line tells you which resource (a webpage) was served to which IP address, and some additional context like how the visitor identified itself (e.g. a user on Netscape Navigator).

Server logs (still) give you a lot of data

This was done by the websites themselves. It was first party data collection.

From that information grew an interest to collect more data than just counting hits to a specific page. We wanted to know where traffic originated from, how users navigated websites, where they left and where they left for: we started tracking our visitors. Cookies were a great way to store information on a personal computer to enable collecting more data. Javascript (a coding language that works in your browser) was a great way to enable observations of what happened on a page after the page was served (and so collect rich information beyond the log lines of the webserver).

But data collection is a lot of work. And data is just… data. We wanted insights! So there was space for web analytics tools that leveraged these building blocks to listen in on your traffic, collect data from it, refine it into usable form, and offer visualisations to convey insights the owners of websites were looking for.

If it’s free you’re the product

To obtain these rich(er) insights, site owners had roughly two choices: pay for all the work involved in upcycling data into insights (either in a product or on payroll), or trade the data for the insights they wanted. Trading data was the cheapest option to site owners looking to learn how to expand their reach, which they could often monetise directly as their reach increased.

And so the web analytics tool that could extract most value from all this data collection to pay for its development could re-invest more of that value into building a better tool, invest in strategies to expand the area of the web they covered, driving more value they could re-invest into better usability and integration, which drove more value to… you get it ♻️ .

That tool was Google Analytics. To which many website owners happily donated data as it could help them grow faster (for I have sinned too, Father).

Over time it gave Google a high resolution radar over the internet, even on websites not part of its ad network. Google Analytics won as Google was best positioned to monetise all that data through their ad clicks. That’s not a lot of incentive to solve an issue like potential idenficiation now is it?

The cliché holds: if it’s free, you’re the product.

Unbundling Google Analytics

And so here we are today, with half the internet using Google Analytics and trading visitor privacy for ease of use because it’s cheap and valuable. But Google Analytics grew in feature richness, businesses wanted to collect more data on customers, and larger parts of “the business” became digital. What was originally “web analytics” data increasingly became… organizational data. Analytics data ended up in critical reporting, CRM’s and even other data products.

Now don’t take this the wrong way: Google Analytics is good software (safe for privacy and potentially phoning home visitor identities). It packs a level of feature and insight readiness that is very hard to beat, brings clear value, and it requires an advanced data capability to outgrow it (the point where “web analytics” starts to become “product analytics”). It is easy to acquire too, much like drive-through fast food is quicker than hosting a fine dining event.

But it is also different (data) functions packaged as one tool.

That is interesting from the privacy perspective as it provides clues to solve the issues your organization is facing other than finding a way to make Analytics itself compliant.

Google Analytics = data + metrics + dashboards + integrations

When you peel the onion on an analytics tool like Google’s, it’s actually four different layers of business value. From top to bottom:

  • Integrations to bring data as signal from or into other systems, for instance search engines, advertising tools, or even CRM’s;
  • Dashboards, that help to quickly comprehend underlying data (I considered calling this layer “insight”, but a graph by itself doesn’t deliver that like magic);
  • Metrics, like turning all visits and check-out completions into an average conversion rate;
  • Data itself. Like the log lines we saw before, and the data infrastructure enabling this (trackers and cookies!).

All supported by a good amount of context and knowledge about growing traffic and increasing conversions applied to metrics and visualizations.

The data layer is the privacy control point

So the perception of value of a tool like Google Analytics focuses higher “up in the pyramid” (e.g. bring GA data into your advertising campaigns). The dirty lower-level work of cleansing, aggregating and interpreting data is tucked away behind a pretty dashboard. That explains why you might have a hard time explaining to your organisation you shouldn’t just sit through the storm and wait for the lobbyists to figure a way out.

Well, here’s you argument: the privacy control point is in those lower layers. The point nobody seems to consider unless they deeply care about data quality. If you don’t want to depend or wait on Google changing the way it makes tens of billions a year, the starting point for a strategy to de-risk GA is collecting your own data.

It is also a very strategic argument, because it means Google is in control of what your data means if Analytics in any way occupies a place in your data flows!

(for that part at least)

Build the capability, reduce the risk

If we unbundle Google Analytics, the best point to start is to take control of your data through first-party data collection. That also requires and positions you to take control of core privacy concerns, like how data was collected and retaining its purpose throughout the data lifecycle. There really is no other way if you want to design privacy into data collection practices.

Now, moving away from a single line of code in a site header to collecting first-party events is a lot more work and a hefty investment (easily racking up a few FTE of data and analytics engineers). You will need to cleanse bots yourself. You will need to account for data drift. You will need to make sure your metrics are calibrated. Operationalize all that. And that’s… work.

But it is like going on a healthy diet vs fast food as per the doctors directive: you have to change the lifestyle, but it will get you further in life. The benefits go way beyond the pressing current privacy issues: relative to the business value you extract from data and additional data value, building capability in the separate layers is often a good business case! After all, when a third party controls what your data means they also control the limits of what you can do with it.

Mapping out the layers to alternatives the picture would be something along the lines of:

  • Integrations to bring back your data to e.g. advertising systems or CRM’s with a Reverse ETL tool like Grouparoo;
  • Dashboards through Redash, Mozaik or even Grafana;
  • Metrics, which you will have to define yourself, for instance with the open source MetricFlow project;
  • First-party data collection with a “shift-left” approach to retain and consider the purpose of data collection all the way from collection to consumption.

May I suggest to put this thinking on the menu for your next discussions on Google Analytics?

PS We’re hiring!

Care for privacy and want to help in building the fast food of privacy infrastructure? Come build STRM to help data teams deliver data products without sacrificing privacy in the process. We are hiring!

Decrease risk and cost, increase speed encode privacy inside data with STRM.