How Can I Prove That My Version B Will Be Better?

You’ve got enough visits to start an A/B testing experiment, but now you’re wondering how that calculation works exactly. How do I know if my Version B will end up being better than my Version A, with numbers.

The Three Factors

Let’s break A/B Testing apart into its three factors.

  1. Number of visits (sample size): The more visitors you have, the easier it is to see if there’s a difference, with confidence (see significance below).
  2. How well your version A is doing (baseline conversion rate): Right now, if you don’t change anything about your site, how well is the thing you’re tracking doing? That newsletter sign-up button, how often is it pressed compared to the total unique visits to that page? Is it 3%? Lower?
  3. How well you think you can improve it (percentage improvement): Do you think you can improve your baseline conversion rate (factor #2) by 20%? 30%? 50%?

There’s a useful relationship between factors #1 and #3. The more you think you can improve your page’s conversion, the fewer visitors you need to prove it.

But how do you prove your version B will be better?

A Significant Difference

Although you can’t prove version B will be better, you can be pretty sure there is a significant difference between the two versions.

That significance is measured by something called the p-value. A value of 0.05 for a p-value is what we’re normally looking for, and it means the following:

p-value

A p-value of 0.05 means that there’s only a 5% likelihood that the difference between version A and version B was caused by something other than the difference you introduced.

Measuring significance is about ruling out other factors much more than it is about proving a factor. To get a more precise version of that explanation, here’s a link to the Wikipedia article on the p-value to dig further.

To get a p-value of 0.05 (or lower, of course, the lower the better), you need to run the math. And that math takes into account the three factors in this way:

  1. Increases in the number of visits (sample size) reduces the p-value (gives you more confidence there’s a significant difference);
  2. A lower percentage for how well your version A is doing (baselines conversion rate) increases the p-value (adding uncertainty, because there are too few conversions to make our calculations from);
  3. Increases in how well you think you can improve it (percentage improvement) reduces the p-value (your improvement will be easier to demonstrate).

Calculating Whether Version B Will Bring A Significant Difference

Given those three factors, how do you calculate whether the percentage improvement will be significant?

For the details on how that calculation is done, here’s a link to an interactive notepad you can modify and see the calculations in action. Here’s how it looks after you follow that link:

Screenshot of the interactive notepad

Note: The calculations used here will be a little different than the ones used in the previous article, namely because the previous article’s tool (Optimizely) derives its p-value calculations from data from their own service.

Next week, we’ll be answering the question: “How Do I Increase Conversion Rates Without Being a Jerk?”

Stay Sharp!



@pascallaliberte

Get articles like this one, delivered on Friday.

To learn to sharpen your own stuff.

Coming in the next few weeks, for example, we’ll be covering about ways to confidently present a service or a product without being generic, and how that helps your visitor go from “I’m not sure”, to “yes, this, now”.

And here's a list of the past articles to get a sense of what you'll get.

Plus, receive a link to a video of a presentation I gave explaining the Jobs-To-Be-Done theory

Another option: on Twitter (@pascallaliberte), you'll get notified of new articles just the same, just a few days later.
With the email list, you get it first.