Getting Started with A/B Testing

mind map a/b testing

Getting Started with A/B Testing

For an expert in Marketing, the term “A/B testing” goes hand-in-hand with “Optimization”, “Improving the KPI’s of the website” and “Increased sales revenue for retailers”. This term is used so often and means so many things to so many people. It’s hard to find a beginner’s view on this topic. My esteemed colleague Sigrid already talked about A/B testing and how easy it is to start these experiments with CoreMedia Content Cloud in her well-read blog post “Stop Watching the Crystal Ball”. But how do I actually start A/B testing? What is it? How can I use this tool properly to reach my goals?

This blog focuses exactly on these questions. Consider this as an introduction to A/B testing. If you are already an advanced user or even an expert on this matter, please don’t overlook the last paragraph.

What is A/B Testing?

Let’s start with a few terms. A nice glossary from our friends at Dynamic Yield can be found here.

What is A/B testing? In general, it’s a research method for website owners to understand and improve the user experience (UX). Website owners run so-called experiments, testing a baseline (“A”) against a variation (“B”). The test itself focuses on a specific hypothesis and aims to reach a certain goal. As a retailer, the goal is typically to increase sales for their brand site, improve interactions with their community or get more newsletter sign-ups.

The test framework randomly adjusts the delivery of the website to show either version A or B to the user. Usually, each content variation (A and B) has a weight assigned that should sum to 1. Classically, a standard A/B test uses a 50-50 split, meaning the test framework renders the baseline content around 50% of the time and the alternative variant around 50% of the time.

However, advanced testers can also adjust the weights for more specific needs. Since the assignment is random, a perfect 50-50 split won’t be achieved at all times during the test. It fluctuates but will settle eventually around the expected weight distribution. Remember the coin-flip experiment back in school? You usually don’t end up with heads, tails, heads, tails, heads, tails, etc. Instead, you have heads, heads, heads, tails, heads, tails, tails, heads, etc. Statistically, over time, you will get closer and closer to an even 50-50 split.

Variant distribution graphFigure 1: Randomly assigned variations are not a perfect split throughout the whole experiment but will level to the allocated weight distribution over time.

Another option is to include more variants. These are called B1, B2, B3, and so on, and in this case, the test is called an “A/B/n Test”. Just like an A/B test, an A/B/n test focuses on one specific content piece and - even more important - on one hypothesis.

What to Test?

A test should focus on a specific hypothesis. A simple question like: “How likely is it that variation B of my banner will get more clicks than my baseline A?” To get to this question, start with the overall goal and determine what you would like to improve.

For example, your big goal could be for your customers to buy more of your products. Other examples could be “I want my customers to interact more actively with the website” or “I want my employees to read my important news in the intranet portal as quickly as possible” – it doesn’t always have to be focused on retail or sales. Next, think about how to achieve that. To stick with the example above, you might want to improve how often your customers visit the Product Detail Page (PDP) and read about your product, thinking this will increase purchases. That is your hypothesis! An assumption.

Based on this, you define more specifically what exactly will provide the desired outcome. For example, you could test the variation of the banner pointing to the product. Or try different images. Or try an animated image vs. a static image. Test different claims and test different Call-To-Action (CTA) labels. There are so many ways that could lead you to the desired outcome. Picking something to start the testing journey is the hardest because you want to test variations that have a chance to make a difference. But for a beginner, it’s hard to determine what exactly is the issue. So, as a recommendation, start with something small as you get familiar with A/B testing. Over time, you will get a better understanding of what changes have a higher chance of impact. Also, don’t try everything at once - start with small changes, make yourself familiar with the procedure, learn, find a good length or test count for your audience, and understand how to read the results first. It’s about getting experience and then expanding your proficiency over time.

Duration of a Test

Another important aspect – take your time and give your users time as well. Don’t finish your test after only a short period of time. And even worse, don’t finish your tests early! Consider your audience and when and how often your visitors see the variations. Remember the last election? Early results seem to hint at one winner, but in the end, the opponent won. This usually happens because there are groups of people who can vote early in the day while others have to go to work first and vote in the evening. Or some voted by mail – whereby these votes were counted already or together with the votes from in-person voters.

Whatever the reason is, similar effects might be true for your audience! Maybe some customers visit your website repeatedly during the week, others only on the weekend. Maybe some use your services early in the month, some at the end of the month. Typically, you have already identified customer segments that include these kinds of behavioral attributes. Make sure that your test period is long enough to give all your customer segments a chance to become a tester. If you can’t wait that long, consider these effects when analyzing the results. Don’t shy away from a test period of 2-4 weeks or longer. However, if you don’t have the time - e.g., you test for a news magazine or something similarly fast-paced - consider Multi-Armed Bandit tests

Measure the Results and Draw Conclusions

In the end, you want to measure the impact the change has. Your analytics framework is key to making the actual measurements. How often was the CTA button clicked? How often did the user spend more than 10 seconds on the PDP? How many users scrolled below the fold? How many actually purchased the item? It is data galore!!!!

Interpreting the collected data is one of the trickiest aspects of A/B testing. All the testing is for nothing if you can’t conclude anything. Always go back to your detailed hypothesis. Did the hypothesis turn out to be true or false? Don’t be afraid to say the hypothesis was wrong. That doesn’t mean your test failed. Quite the opposite! You learned that the change had an impact. It means, for the next test, you will use a different lever to reach your goal.

So, what is an outcome of the test that I can identify as a true result? Instinctively, you might say “A higher click-through rate for one variation defines a clear winner, right?” And yes, if your test is set up properly, this might be true! But what if the baseline and the variant have the same click-through rate (CTR), does it mean they are equally good? Well maybe! Consider this: both banners have the same click-through Rate, the users reach the PDP at almost the same rate. But what now? Your goal is to increase sales. Potentially, users who saw and clicked the variant might not purchase the actual product compared to the baseline users. Maybe they clicked on the fancy CTA button out of curiosity or reflex, compared to the baseline users who were truly interested in the product. So, your first measure - the click-through rate - looked about the same but mixing in the second measure – the actual number of purchases – gives you the actual winner!

Feedback Hub Analytics Chart

Figure 2: Built-in feedback for click-through rates with CoreMedia Content Cloud Experience Feedback Hub - directly integrated into the editorial interface for immediate feedback to the content creator.

By randomly assigning the baseline and the variant options, you have a good chance of getting a fair result for each test. If you dig a bit deeper, there are a lot more things to consider regarding the test and the interpretation of the result. For example, is the random test really that evenly distributed across the different customer segments? What if you test a banner using female vs. male models for a unisex product? But your customer base is 80% female, and only 20% male. If the teaser with the female model has a higher click-through rate, it can also be tied to the disproportionally high rate of female customers. At the same time, you might lose a few of your male buyers. You might want to think about personalizing your content and showing the female model banner version to your female customers, and your male model version to all your male customers.

Have I mentioned that reading the results is the hardest part? Don’t get frustrated! Over time, you will gain experience and you and your Marketing team will build a strategy on what to test and what to measure to come to actual comparable results. Allow yourself and the team time to learn!

Multivariate Testing

You noticed how quickly “simple” A/B tests and their interpretation can get complex. We focused on one change on one item, but of course, what about changing multiple items at the same time? Let me introduce you to “multivariate testing”. Here, the test doesn’t focus on one piece of content in different variations, but rather multiple pieces that together make up the test – still working towards one hypothesis, one goal. For example, while you might test a banner as a simple A/B test in two different rendering variants or two different CTA labels, a multivariate test would consider this banner and two other banners visible on the same page or other pages. These tests are more complex and the results even more complicated to understand – which one of the changes truly made the impact? Also, here read a bit more about the complex nature of multivariate tests. I recommend tackling the multivariate test approach after you are fully comfortable with classic A/B tests and ready to build upon the knowledge you gained with those.

Getting Comfortable with A/B Testing

There is only one way to get proficient with the topic of A/B testing: learning step by step! Maybe you could run your first test as an A/A test. Wait! What? A + A… Baseline and Baseline? Would that help anyone? It has a few advantages, such as a starting simple and an uncomplicated way to familiarize yourself and the team with testing procedures. But it also allows you to test your A/B testing framework. Does the random distribution of the variants work properly? Is my analytics framework measuring properly? Next to the simple fact that it helps you get used to the testing process, it validates your setup. This is especially recommended if you introduce a new tool or framework.

Furthermore, how about the procedure itself? How long do you need to run the tests? How many clicks, how many weeks? Remember the fluctuation I described above of the random distribution of your variants? You can experience similar effects with the actual behavior of your users. A lot of experts recommend “No peeking!” during the tests. That means: Be patient! Wait for the full test run to be finished before starting to analyze the results. The point is to not conclude too early. Remember my examples from above? It’s a common error to see a clear leader at a certain time in the test, who then ends up being the loser at the end.

In my opinion, if you are new to A/B testing, go ahead and peek, but don’t interfere! Peeking can give you the chance to learn a few things along the journey. For example, you can see how the variants “race to the finish line”. Maybe that helps you understand your audience better. Maybe you can identify patterns for different customer segments. Everything that helps you to understand your customers better will help you in your ultimate goal, even if it doesn’t influence the specific test at hand.

Conclusion and More Advanced Considerations

There is a lot more to say about A/B testing: more terms, more things to consider, more things to tweak, more possibilities! Our partners at Dynamic Yield are experts in that field and are happy to provide further insights and deep knowledge. For example, have a look at their very detailed and comprehensive A/B Testing & Optimization Course. The course explains all the aspects, all the things to consider, the various types of tests, how to read the results and so much more!

And of course, CoreMedia Content Cloud can integrate with A/B/n testing and personalization engines such as Dynamic Yield. Even if you’re a beginner in A/B testing, give CoreMedia Content Cloud and our out-of-the-box capabilities a try. If you are an advanced tester or even an expert, have a look at these testing and personalization engines – knowing you can integrate these with CoreMedia Content Cloud as your best-in-breed CMS, as well!

If you are interested in a demo on how marketers can work with CoreMedia and A/B/n Testing, please contact us. I look forward to speaking with you!


About the Author

Ulrike Heidler is Director of Sales Engineering Americas for CoreMedia based in the US with more than a decade of consulting experience. A self-professed nerd, she is a dedicated technology enthusiast and a fan of the creative possibilities it offers – particularly around content management. Her specialties include client architecture, agile project management CMS solutions and providing colorful commentary on sporting events ;)