Silktide uses a common mathematical technique to approximate some data. This explains how Silktide uses it and why.
What is sampling?
Sampling is a technique used to estimate findings, when looking at all of the data is not practical.
For example, it is common to sample people who might vote in an election. As it isn’t practical to ask every single voter, you might ask a sample – say 1 in 10,000 voters – and then multiply the number of results you get by 10,000 to estimate what the whole population might say.
Done correctly, sampling is reliable and consistent with the ‘real’ data.
Why does Silktide use sampling?
If you are reviewing a very large amount of analytics data – e.g. tens or hundreds of millions of views – then the amount of calculations required to analyze it can become impractically slow.
Most users would consider it unacceptable to wait 30 seconds or longer for key metrics to appear. By using sampling, Silktide is able to maintain consistent performance, regardless of how large your analytics data becomes.
How does Silktide use sampling?
Silktide only uses sampling when the data you are looking at is extremely large, or the use of sampling would not introduce an expected error of 1% or greater.
Typically, this means if you are looking at over 5 million events in a given filter (e.g. your date range), that Silktide will limit its sample to no more than 5 million records.
Depending on context, Silktide may use fewer records, where a smaller sample would be statistically significant. For example, to display a heatmap, only a sample of up to 20,000 records is used.
History
Sampling was added to Silktide Analytics on 29 May 2024.