You're setting yourself up for disappointment if you peek at your AB tests

Peeking (checking for significance in an AB test before collecting enough samples for the desired power) is generally discouraged due to the risk of inflated type 1 error rates.

However, in most cases, the null hypothesis is unlikely to be true if we’re not just randomizing into two groups without making any changes. Therefore, the main concern with peeking is not inflated type 1 error rates but the risk of overestimating effects. Detected effects from peeking are unlikely to generalize and often exaggerate our impact, which can lead to disappointment.

This phenomenon, known as the Winner’s Curse, is a common pitfall in data analysis. If you need to explain to stakeholders why they should avoid peeking, it’s more effective to focus on the practical consequences rather than the statistical rationale

2 Likes

It’s like glancing at the answers before you’ve finished your test. It might feel like a lucky break, but it often is just that—luck. These early successes can trick us into thinking our idea is better than it actually is. Rather than fixating on whether our initial guess was correct, we should focus on whether it will be effective over time.

1 Like

Sure. There’s a greater likelihood of overestimating the effect magnitude when you peek. The reason for this overestimation is that random fluctuations may cause early findings to indicate a greater influence than there is.

Therefore, any effects that are found are probably overstated and might not hold if the experiment is run again or at a larger size. The “winning” outcome in the test—that is, the one that appears to indicate a substantial effect—is frequently not as powerful as it first looks. This overestimation is known as the Winner’s Curse.