Please upgrade your browser to view this website. Thanks!

How Long Does it Take to Get Accurate A/B Test Results?

It’s no surprise that A/B testing headlines is all the rage these days — A/B testing is a proven method of increasing conversions and teaches editors how to write better headlines. But because stories don’t live forever, you need to be using the right kinds of headline testing tools. Ones which measure engagement after the click.

We’d all love it if we could instantly know if one headline does better than another at increasing conversions. But the reality is that an A/B test, just like any other experiment, needs to run long enough in order to give us reliable results — sometimes longer than the seconds for which headlines are relevant.

Let’s look at what it means for an experiment to be accurate and then look at what we can do to make A/B tests quick and accurate.

Any reliable experiment needs to have these two properties:

  1. If headlines have about the same performance, statistical significance is the low probability that one has accidentally interpreted statistical noise as a difference in conversions.
  2. If there is actually a big difference in headline conversions, statistical power is the high probability that one has detected this difference.

Suppose a headline has a 5% click-through rate (CTR) on a landing page that gets 694 visitors per hour. If this pageview rate is constant, this equates to about half a million monthly pageviews. The hypothesis we want to test is whether a new headline that we wrote performs at least 50% better or worse than our baseline — that is, if it has a CTR of less than 2.5% or greater than 7.5%.

We’ll design an experiment that has 5% statistical significance and 90% power. If our hypothesis is false, we’ll give a false positive only 5% of the time, which means that 95% of the time we’ll have the right answer. If our hypothesis is true, our experiment gets things right 90% of the time.

In the scenario above, we would have to run our A/B test for about 5 hours before we get enough trials to end our experiment.

Want to maximize the ROI of your headline testing tool? Download our guide to learn the 7 steps to writing and testing better headlines.

Three factors influence how quickly an experiment can achieve statistical significance and statistical power:

  1. Landing page traffic
    The more visitors viewing your landing page, the more quickly you have human subjects to experiment on. The time needed to complete an A/B test is inversely proportional to your traffic. For instance, if our landing page gets 1,388 visitors per hour (one million monthly pageviews) instead of 694, we reduce the amount of time needed to two hours and 30 minutes.
  2. Base click-through rate
    Headlines that appear prominently and above the fold tend to have better click-through rates, which means we quickly converge to precise estimates of click-through rates. Suppose we change our baseline CTR from 5% to 10%. Keeping our 1,388 visitors per hour, our experiment time decreases again to about one hour and 11 minutes.
  3. Difference in headline performance
    If two headlines perform similarly, it’ll take more trials to be sure that the small differences we’re observing aren’t just noise. Suppose that we think that our new headline is going to be either much better or much worse than our baseline. We modify our hypothesis to ask whether a new headline that we wrote performs at least 75% better or worse than our baseline. Keeping our 1,388 visitors per hour and our baseline CTR of 10%, we see that our experiment time decreases by half yet again to 32 minutes.

What does this mean for me?

  1. If your landing page gets more than a million pageviews a month, you’ll be able to reliably A/B test your top headlines before your stories get old – you’ll most likely get results within a couple of hours. With more traffic, you’ll be able to test less prominent headlines on your page and reduce the amount of time needed for your A/B tests.
  2. If your site gets less than a million pageviews a month, there’s still hope! Because click-through rates and the differences between your headlines are also major factors in determining the speed of A/B tests, A/B testing might still work for you.
  3. On a typical landing page, your top performing headlines above the fold might have a 5-7% click-through rate. As you scroll down the page, headline CTR’s tend to drop below 1% for the average link. Unless you have a massive amount of traffic (more than 10 million monthly pageviews) or have two headlines that are drastically different from each other, you’re probably going to wait more than few hours before you get results on headlines below the fold.

But even more critical than the statistical building blocks of headline testing are the questions it poses. You’ve got the great content — but are people seeing it? Are you driving readers to your most engaging content — the content that will keep them coming back again and again? To give your headlines a winning chance, you need the right measurement tools.

Chartbeat Engaged Headline Testing helps you get to know your audience on a deeper level and promote your content with headlines that you know are more likely to grab — and keep — readers’ attention. This tool identifies not only the headlines that are being clicked on, but also the ones that lead to engagement with your actual content. Using our Quality Clicks metric, the testing tool tallies the clicks that correspond to a user actively engaging with your content for at least 15 seconds after clicking the headline.

For more on headline testing best practices and Engaged Headline Testing, check out How to Use Headline Testing to Hook and Hold Readers.

References
1. http://www.evanmiller.org/ab-testing/sample-size.html

Liked this Article? Share it.