‘Probabilistic’ and ‘deterministic’ are two words that get thrown around a lot in the digital ad industry, especially in this age of hype for artificial intelligence – but not everyone necessarily understands exactly what these terms mean.
Machine’s Technical Director, Robert Pawlowicz, is here to explain the difference, how each applies to fraud detection, and the benefits and drawbacks of each approach. Answering those questions, one at a time, to help you understand where install fraud comes from.
Probabilistic and deterministic are mathematical terms, which – without getting into the philosophical weeds of epistemic versus ontic models –
These terms describe approaches that rely purely on solid measurable data and a single outcome, as opposed to a probability distribution of outcomes, based on their likelihood.
To put that in more concrete terms, let’s look at how each approach might be used to detect app fraud.
How probabilistic & deterministic models detect fraud
A probabilistic model might use a time-to-install metric: how long passes between a click and an install. Probability determines that, if a real human is seeing an advert and being influenced by it, then most installs are going to follow shortly afterwards, with a nice smooth distribution as that influence then degrades over time. You can expect to see perhaps 75% of installs occur in the first hour, and 95% in the same day.
You can take that predictable distribution and line it up with your actual time-to-install measurements to see how well they match up.
If the distribution of installs is relatively flat, with about as many installs coming in the seventh hour – or even the seventy-second hour – as in the first, then it’s highly unlikely that there is any genuine influence between the click and the install. You can be fairly sure no ad has been shown to a real person.
Meanwhile, the deterministic method of identifying the link between click and install – or, more to the point, the lack thereof – relies entirely on black-and-white data.
Deterministic methods rely on black and white data.
If the hardware profile of the device that clicked on the ad doesn’t match the hardware at the point of install, then it’s irrefutable that these are two different devices.
The pros and cons of probabilistic & deterministic
Both methods are effective, but they have different strengths.
A probabilistic model can work with poor-quality data – in the above example, you just need the time of click and install. However, it requires a huge data set to establish a pattern. If you don’t have hundreds of installs to examine, then the distribution won’t be at all reliable. The data that a probabilistic method gives you isn’t especially concrete.
By definition, a probabilistic method is working with chances of false negatives and positives, meaning you risk missing fraudulent installs or throwing out the good with the bad.
Deterministic requires more in-depth data, but because it trades in certainties – ‘the device clicking the ad is not the same as the device installing the app’ – it produces much more actionable evidence when it comes to disputing fraudulent installs with suppliers.
Deterministic approaches provide the kind of solid proof that is vital if you’re trying to get your money back.
Probabilistic can be very good for warning you when something’s not right, so you can focus in on problem areas. But if you want to block fraud in advance, you need to be completely confident you’re not going to cause any collateral damage by blocking legitimate installs – and that means deterministic is the only way.
When it comes to combating app install fraud, probabilistic is a hammer – a useful tool, but not ideal for precision tasks. But deterministic? It’s a scalpel.