A/B Test Calculator
Determine the right sample size for your A/B tests with statistical confidence.
Test Parameters
Fill in your experiment settings, then click Calculate.
Randomized experiment: two independent groups compared against each other.
Use Zα/2 — detects both positive and negative differences. Requires larger sample.
Your control group’s current conversion rate (p₁). Range: 0.1%–50%.
Smallest relative improvement worth detecting. 20% on 5% baseline → target 6%.
How certain you want to be the result is real. 95% = 5% false-positive risk.
Chance of detecting a true effect. Higher power = larger sample needed.
Check Your Power
Already running a test? Enter your current sample size to see your detection probability.
Run the calculator above first to set your test parameters (baseline rate, MDE, confidence, test type).
Find Minimum Detectable Effect
Already have a fixed sample size? Find the smallest relative lift you can detect with the power and confidence set above.
Run the calculator above first to set your test parameters (baseline rate, confidence, power, test type).
What do these numbers mean?
The sample size is the minimum number of users each variation needs before results are statistically meaningful. Power tells you how likely you are to detect a real improvement. 80% power = 20% chance of missing a true effect. 95% confidence = 5% chance of a false positive.
Two-sample A/B test
Used when you randomize traffic into two independent groups (A and B). Both variances — null and alternative — are estimated from the data. This is the standard A/B test setup.
One-sample vs. historical baseline
Used when you compare a new measured rate against a fixed historical value (p₀). Only one random sample is collected. The null variance uses p₀; the alternative uses the expected p₁.
Two-sided test
Detects any difference (positive or negative). Uses Zα/2 — divides the significance budget across both tails. Requires a larger sample but guards against unexpected harm.
One-sided test
Only tests for improvement in the expected direction. Uses Zα — puts all significance into one tail. Requires a smaller sample but cannot detect harm if the change backfires.