Deriving sample count for AB testing
Key assumptions for type 1 and type 2 errors
Assuming we allow 5% risk of concluding that there was a change where it wasn't which could make a lot of business impact if wrong: Significance Level (α) = 0.05 And we allow 20% risk of concluding that there was no significant change, not to spend a lot of money on ads: Statistical Power (1−β) = 80% or 0.8
p - conversion rate we use to compute standard deviation σ n - landing page clicks or other desired actions
So Minimum detectable error is
Deriving sample count for a given MDE
Let's say we don't give a shit about changes less than 20%, for example, from 5% CR to 6% CR. MDE is an absolute value, for example, CTR old - CTR new, so in our case MDE is 20% from 1% Solving this equation for number of samples n
So if we thought we could make 5% CR landing page at least 20% better we would need to bring
What do i do with this?
Literally set autorules on ad networks to run tests with this data if using ugly implemented standard ones isn't an option or they aren't implemented at all.
Also there is a little secret, if you run an AB test like that with 2 new ads it won't be correct. To understand why read this article. Also in some cases you don't care which variant is better in certain cases, here is why.
Calculator
Finally you don't have to compute it manually every time, we created a calculator just for you.
Knowledge Graph
Full graph →