Google Analytics has a powerful way of measuring cross-device behaviour. Adding a user ID to your data collection gives you access to reports like device overlap, device paths and device attribution. But it has some limits. In this post, I’ll dive into those limits and discuss ways to overcome those limits.
How Google Analytics measures Cross Device
Google allows you to identify users over different devices by adding a user ID to your data collection. This will give you three types of reports:
- Device Overlap: a report that shows you the overlap of the three device categories desktop, tablet and mobile.
- Device paths: what device paths occur the most before a transaction.
- Device attribution: the attribution of the three device categories. How many revenue do users who had that first session on mobile generate on other devices?
These are some powerful reports, but they have their limitations.
Where Cross Device in Google Analytics falls short
First of all, it will only add the three reports mentioned above. The source attribution models aren't affected by the user ID because Google uses the client id for modelling. Secondly, it will only include sessions where a user logs in. Every session without a log-in is not included in the view.
Google Analytics Cross Device Model. Blue: sessions that are not included in cross-device reports. Green: sessions that are included in cross-device.
It’s weird that Google made it work like this, as it tracks returning users within two years based on the auto-generate client id (cid) by default. How hard is it to attribute a single client ID to a user ID once the user logs in, whether it’s the first, second, or fifteenth session of that user? Just like the sessions unification feature that you can set optionally, a feature like user unification would be great. There are ways to overcome this, e.g. by manually storing the user ID in a cookie as soon as the user logs in. This way you’ll be able to use it when a user returns without logging in (and without removing cookies). But again, it’s weird that GA doesn't offer a solution for this problem by default.
How to trick Google Analytics into extra insights
Not having cross-device source attribution in Google Analytics cross-device views got me thinking: is there a way to add these insights to my GA reports? Luckily there is, and the solution has always been out there. It has to do with Google Analytics’ last non-direct click model. It's the default model in Google Analytics’ standard reports and it works like this:
- A user lands on a page
- Does the user have a source (referral, organic, or UTM tagging)?
- Yes: attribute the session to the source;
- No: Did the user have a source in the past 6 months?
- Yes: attribute the session to that source.
- No: attribute the session to direct traffic
The important part is the ‘Did the user have a source in the past 6 months?’ question. Google looks back 6 months based on your client ID. If a client ID appeared before, it will look up the last known source of your previous session. The client ID is the key to the extra insights.
Setting a custom client ID for cross-device source attribution
The tracking code of Google’s Universal Analytics tracker allows you to set a custom client ID. By setting it to the value of your cross-device identifier (e.g. a user ID), you’ll trick Google Analytics into using that value for all of its calculations. This will automatically add cross-device attribution models to your reports. Assuming that a user is logged in in all of his sessions, a direct session on a desktop will be attributed to a mobile source (if the user had a mobile session with a source).
Google Analytics custom cross-device solution. Blue: sessions without custom cross-device modelling. Green: sessions with custom cross-device modelling.
The downside of this trick is that setting the client ID to the value of the user ID during the sessions triggers a new session. After the login (session three in the image above), Google Analytics attributes activity to a new client id (the value changed from auto-generated client ID to the manually set user ID). So for every first login, we’ll measure an extra direct session and a new user.
A technical note
Setting a custom client ID has a high impact on your data collection. I strongly advise you to set up a secondary tracker to collect this data. Also, set a custom cookie name for the secondary tracker to leave the original _ga cookie untouched.
The data of the custom client ID data collection
We’ve had this trick running for a couple of months now for one of our clients and it works. Let’s look at a frequently returning traffic source, Google AdWords, or google CPC:
Graph with added AdWords sessions by setting a custom client ID.
As you can see, the new way off collecting data attributes 10% to 25% more sessions to AdWords. There are also sources that show the opposite effect. A good example is the password reset source:
Graph with added password reset sessions by setting a custom client ID.
What’s interesting here is that the custom client ID has a negative impact on sessions attributed to this source. Why? Maybe it's good to take a minute to think this one through (or just continue for the answer).
Users who reset their password often login right after the reset. This changes their client id and triggers a new session. Without the solution, Google would attribute all direct sessions after the reset to the reset password source. With the client ID overwrite, it loses the password reset source and with it, less sessions will be attributed to that source.
Our final graph shows you the impact on direct traffic. As discussed, every first login triggers a new session from a new user. Because of this, you would expect a peak in direct traffic.
Graph with added direct sessions by setting a custom client ID.
As you can see, direct got way more sessions than before, over 5 times as many. This is something you’ll have to take into account when implementing the client ID trick.
Where do we go from here
The biggest challenge with this solution is reducing the number of direct sessions. There are several tricks for this:
- Add the user ID as a parameter to all outgoing emails to existing clients. This way, every session that starts from an email will start with the cross-device client ID.
- Store source information in a cookie and attribute the new sessions of a first login to the stored UTM tagging by code.
Though this fix reduces the number of direct sessions, they don’t fix everything. We will still have an increase in sessions and users.
Leaving Google Analytics behind for a custom model approach
The best solution may be dropping Google Analytics all together, and start collecting raw data for a custom model. We currently use Snowplow to collect raw event level data for our data science department. Event level data collectors also generate and collect values similar to a client ID and allow you to add custom data such as a user ID. The power of raw event level data is that you’re completely free to model the data to match your needs. This also applies for cross-device analysis.
Custom cross-device model based on event level data. Blue: sessions. Green: sessions with user ID. Faded green: sessions with retroactively applied user IDs.
If we look at the image above, we see that only sessions 3 and 6 have a user ID available. But with custom modelling, we can create a model that knows that client ID 123 is the same user as the one with user ID 456. This model will apply the user ID retroactively. If client ID 123 has had 40 sessions over the past 3 years, and the user logs in for the first time today, the model will connect those 40 sessions to that one user ID.
Google Analytics’ cross-device reports are powerful but fall short in some areas. There are ways to get around this, but as long as Google doesn't change its model, the custom solution will never be perfect. So for now, it’s best to use Google Analytics for your general cross-device analysis. As soon as advanced cross-device modelling is needed, I would advise you to look at a custom data solution.