A/B Test Calculator

Determine the right sample size for your A/B tests with statistical confidence.

Test Parameters

Fill in your experiment settings, then click Calculate.

Test Type

Randomized experiment: two independent groups compared against each other.

Hypothesis Direction

Use Zα/2 — detects both positive and negative differences. Requires larger sample.

Baseline Conversion Rate (%)

Your control group’s current conversion rate (p₁). Range: 0.1%–50%.

Minimum Detectable Effect (% relative lift)

Smallest relative improvement worth detecting. 20% on 5% baseline → target 6%.

Confidence Level

How certain you want to be the result is real. 95% = 5% false-positive risk.

Statistical Power

Chance of detecting a true effect. Higher power = larger sample needed.

Check Your Power

Already running a test? Enter your current sample size to see your detection probability.

Run the calculator above first to set your test parameters (baseline rate, MDE, confidence, test type).

Users per variation

Find Minimum Detectable Effect

Already have a fixed sample size? Find the smallest relative lift you can detect with the power and confidence set above.

Run the calculator above first to set your test parameters (baseline rate, confidence, power, test type).

Users per variation

What do these numbers mean?

The sample size is the minimum number of users each variation needs before results are statistically meaningful. Power tells you how likely you are to detect a real improvement. 80% power = 20% chance of missing a true effect. 95% confidence = 5% chance of a false positive.

Two-sample A/B test

Used when you randomize traffic into two independent groups (A and B). Both variances — null and alternative — are estimated from the data. This is the standard A/B test setup.

One-sample vs. historical baseline

Used when you compare a new measured rate against a fixed historical value (p₀). Only one random sample is collected. The null variance uses p₀; the alternative uses the expected p₁.

Two-sided test

Detects any difference (positive or negative). Uses Zα/2 — divides the significance budget across both tails. Requires a larger sample but guards against unexpected harm.

One-sided test

Only tests for improvement in the expected direction. Uses Zα — puts all significance into one tail. Requires a smaller sample but cannot detect harm if the change backfires.