Did you know that data sampling in Google Analytics 4 is far less common than in Universal Analytics? Sampling still happens, but GA4 offers sites, web shops and apps with a lot of visitors more reliable reports. Data-driven decisions and less risky. That sounds promising…
This article will help you get a better view on GA4 data sampling and its importance for your business-critical decisions.
What is data sampling?
Data sampling is a statistical analysis method that doesn’t use all available data to understand the bigger picture. The advantage of using a small set of data is that this approach works faster. As a downside, sampled reports are statistically less reliable than an analysis based on a complete dataset.
Before we dive into GA4, let me take you on a small journey to understand why data sampling is not a bad thing in general.
In 2021, our Data Driven team conducted research to find out how many websites had already migrated to Google Analytics 4. We had two options:
- Visit every single website and check it manually. (A scanning tool would be faster)
- Survey 460 marketers and ask them if they had migrated to GA4.
The second approach is a good example of data sampling. We composed a small data set and drew conclusions from it in a broader sense.
Although the results are from 2021, many organizations are still reluctant to switch to GA4.
Let’s find out if data sampling is a good argument to switch to GA4.
Do Universal Analytics and GA4 sample data?
Both Universal Analytics and Google Analytics 4 sample data, but there are huge differences in how and when they do it. UA can make it hard for you to see if the data of reports are sampled. It also uses session limits to generate reports and, on top of that, a hit limit can result in a brutal ending of collecting data.
A couple of lines above, I stated that sampling data is not bad per se. You absolutely don’t have to count every single person on earth to have an idea about how big the world population is.
In Google Analytics however, data sampling righteously worries website owners and marketers.
After all, Google built its empire with data. And yet, its free website analytics tool doesn’t collect, or use all data to generate reports.
That doesn’t seem fair, does it?
Especially for heavy traffic websites, sampling can be a problem. Let’s have a closer look at how GA4 is an upgrade in that field.
How do you know if your data is sampled in GA4?
The moment your report is based on sampled data, GA4 shows an orange triangle icon. When your report is not sampled, you see a green check mark.
You can see a brief explanation when you click on the small arrow pointing down. A report generated without sampled data shows this:
In UA, the icons are different, but you can switch between Faster response and Greater precision.
That neat feature is no longer available in GA4. But on the other hand, UA sometimes applies data sampling implicitly.
You can, for instance, filter out all sorts of data in the Views of your Universal Analytics property.
The screenshot below shows an extreme example with an empty report based on all available data… for that particular View.
Not all website owners and marketers are aware that the UA reports they are looking at can be based on a small selection of data. Since GA4 doesn’t have views, that’s not an issue any longer.
But, making decisions based on filtered reports and assuming they are using 100% of all data? That’s risky business.
Let’s look at another sampling problem in Universal Analytics.
Hit limit in GA4
In contrast to UA, GA4 doesn’t have a hit limit. This means that GA4 doesn’t stop collecting data after a certain point, as is the case in UA..
10 million hits may seem a lot and in all honesty, UA has never caused much sampling headaches for smaller websites and web shop owners.
Heavy traffic site owners, on the other hand, received an email from Google Analytics when the hit threshold was reached or crossed.
There are workarounds, like the (expensive) paid Google Analytics 360. But since GA4 doesn’t have any hit limitations, this makes me curious about what Google will have in store for its new paid web analytics tool.
Session limits in UA reports
Universal Analytics has a limit for creating reports. The moment the selected date range is based upon 500K sessions, the data is sampled.
Smaller websites and blogs can be affected when UA reports are generated for a period of several years.
For popular websites and ecommerce sites, those limitations were horrible for shorter periods too.
Switching to GA4 with no hit and session limits is an upgrade for data lovers. Most of the reports are generated with all the data.
Ouch.
You read it right. Most of the reports.
Unsampled reports are not a complete thing of the past. Your data still needs to meet certain criteria.
Thresholds in Google Analytics 4
When there is too little data to create a report, or an exploration, GA4 samples your data. Again, the orange warning notifies you when this is the case.
This is the opposite of UA, where sampling took place with too much data.
Especially for websites with low volumes, this situation can last a while. But this is nothing to worry about.
Once you have enough website users, that problem is gone.
I mean, almost gone…
Unsampled default reports
GA4 doesn’t sample data for the standard reports, which you can find in the menu on the left under Reports.
This includes:
- Real-time
- Life-cycle
- Acquisition report
- Engagement
- Monetization report
- Retention
The demographics user report is an exception. The reason it can contain sampled data is to protect the privacy of users.
Sampled data is, just like cookieless tracking, an ethical choice. When there is not enough data, it would be easier to identify visitors in GA$ with their devices, userID or even user generated content fields.
Exploration reports
Normally GA4 explorations are also unsampled. You can find them in the menu on the left.
At this moment, there are 7 different explorations in GA4:
- (Blank)
- Free-Form
- Funnel exploration
- Path exploration
- Segment overlap
- Cohort exploration
- User lifetime
The moment you add dimensions to your reports, data sampling can occur.
As for default reports, this can be due to a shortage of data, or privacy related data.
How to work around sampled reports?
In GA4, you can work around sampled reports by linking GA4 to BigQuery.
You will have access to all the raw data. With tools such as Data Studio, you can then generate 100% unsampled reports.
Jeff, the founder of Data Driven U explains more about this in our Google Analytics 4 alternatives series and our YouTube channel.
Conclusion: is GA4 more reliable than UA because of the differences in data sampling?
When it comes to data sampling, Google Analytics 4 scores better than Universal Analytics.
The absence of hit limits and session limits in GA4 ensures that the reports and explorations are (most of the time) based on 100% of the data.
That’s more reliable than Universal Analytics reports, which were based on a percentage of available data once certain thresholds and limits were crossed.
Also, UA Views can be confusing. They filter out data and not all Universal Analytics users are aware of this.
Especially websites with lots of visitors profit from GA4’s unsampled reports. For smaller websites, GA4 is more likely to sample data to protect the privacy of users.
100% of the Data Driven team members hope this information will help you make better business decisions. One of these can be to switch to GA4 today. We composed a free GA4 migration guide to help you do this smoothly.