Testing the func­tion­al­i­ty of various elements using A/B tests is now common practice for most website de­vel­op­ers and operators. If suf­fi­cient traffic is available, this test procedure quickly reveals whether scenario A is more suc­cess­ful than scenario B. There are many obstacles that can be en­coun­tered during the planning phase as well as during the test phase and final eval­u­a­tion. Here are the most common sta­tis­ti­cal errors and how you can avoid them:

The biggest mistakes in A/B test planning

Even before you’ve started the test, you might have already set yourself up for failure if you’ve made as­sump­tions and your set-up is based on these. Error 1: foregoing a hy­poth­e­sis and playing it by ear Probably the worst mistake that can be made in the prepa­ra­tion stage is to forego a hy­poth­e­sis and hope that one of the variants you’re testing will be the right one. Although the number of randomly selected test variants also increases the chance of finding a winner, there’s also the chance that this winner won’t help to improve the web project. With a single variant, you will notice sig­nif­i­cant op­ti­miza­tion in 5 percent of cases even though in reality no op­ti­miza­tion has taken place. The more variants that are used, the more likely an alpha error will occur – there’s a 14% chance with 3 different test objects, and 34% with 8 different variants. If you don’t decide on a hy­poth­e­sis be­fore­hand, you won’t know what kind of op­ti­miza­tion the winner is re­spon­si­ble for. If you decide on the hy­poth­e­sis that enlarging a button will lead to an increase in con­ver­sions, you can classify the sub­se­quent result. In summary, it can be said that A/B testing is by no means de­ter­mined by co­in­ci­dence, but rather you should always be hy­poth­e­sis-driven and a limited number of variants. If you also work with tools such as Op­ti­mize­ly, which prevent the error rate from in­creas­ing, nothing will stand in the way of suc­cess­ful testing.

Error 2: de­ter­min­ing the incorrect in­di­ca­tors for a test variant’s success

Key Per­for­mance In­di­ca­tors (KPIs), which are crucial to your project, also play an important role in A/B testing and shouldn’t be neglected. While in­creas­ing page views and clicks on blogs or news portals already dictate valuable con­ver­sions, these factors are no more than a positive trend for online stores. Key in­di­ca­tors such as orders, returns, sales, or profits, are sig­nif­i­cant­ly more important for stores. Because they’re difficult to measure, A/B tests, which count on a main KPI as the absolute profit, take a lot of effort. In turn, they can predict success much more easily than tests that only take into account whether a product has been placed into the shopping cart. This is because the customer might not even end up buying the product in the cart.

It is therefore important to find the ap­pro­pri­ate values. However, you shouldn’t choose too many different ones. Limit yourself to the essential factors and remember the pre­de­fined hy­poth­e­sis. This reduces the risk of presuming there will be a lasting increase even though it’s actually just a co­in­ci­den­tal increase with no lasting effect.

Error 3: cat­e­gor­i­cal­ly elim­i­nat­ing mul­ti­vari­ate testing In some cases when preparing A/B tests, you might want to test several elements in the variants. This isn’t really feasible with a simple A/B test, which is why mul­ti­vari­ate testing is used as an al­ter­na­tive. This concept is often rejected since mul­ti­vari­ate tests are con­sid­ered too complex and in­ac­cu­rate even though they could be the optimal solution to the afore­men­tioned problem if used correctly. With the right tools, the various test pages are not only quickly changed, but they are also easy to analyze. With a little practice, you can work out the dif­fer­ence that an in­di­vid­u­al­ly modified component makes, but your web project first needs to have enough traffic. The chance of declaring the wrong winner increases with the number of test variants used – therefore it’s rec­om­mend­ed to limit your choice to a pre-selection when using this method. In order to be certain that a po­ten­tial­ly better version actually surpasses the original, you can validate the result in ret­ro­spect using an A/B test. However, the prob­a­bil­i­ty of an alpha error occurring is still 5%.

Statistic problems during the test process

If the test is online and all relevant data has been recorded as desired, it would be fair to believe nothing else stands in the way of suc­cess­ful A/B testing. Im­pa­tience and mis­judg­ments often mean this isn’t the case, so make sure you avoid these typical errors. Error 4: stopping the test process too pre­ma­ture­ly Being able to read detailed sta­tis­tics during the test proves very useful, but it often leads to premature con­clu­sions with users even ter­mi­nat­ing the tests too soon in extreme cases. In principle, each test requires a minimum test size since the results usually vary greatly at the beginning. In addition, the longer the test phase persists, the higher the validity since random values are noticed and can then be excluded. If you stop the test too early, you run the risk of getting a com­plete­ly wrong picture of how the variant is per­form­ing and then clas­si­fy­ing it as far either better or worse than it really is. Since it’s not so easy to determine the optimal test time, there are various tools such as the A/B test duration cal­cu­la­tor from VWO, which you can use to help you with the cal­cu­la­tion. There are, of course, very good reasons for ending a test pre­ma­ture­ly, for example, when a variant is per­form­ing badly and could jeop­ar­dize your economic interests.

Error 5: using modern test processes in order to shorten the test length It is no secret that various A/B tests work with methods to help keep the error rate as low as possible among the variants used. The Bayesian method, which is used by Op­ti­mize­ly and Visual Website Optimizer, promises test results even if the minimum test size hasn’t yet been reached. If you use results from an early stage for your eval­u­a­tion, you could encounter statistic problems. On the one hand, this method is based on your estimates regarding a variant’s success, and on the other hand, the Bayesian method cannot identify initial values as such.

Common errors when analyzing A/B test results

It’s chal­leng­ing finding suitable KPIs, for­mu­lat­ing hy­pothe­ses, and ul­ti­mate­ly or­ga­niz­ing and carrying out the A/B test. However, the real challenge awaits you when it comes to analyzing the collected values and using them to make your web project more suc­cess­ful. This is the part where even pro­fes­sion­als can make mistakes, but at least make sure you avoid any of the mistakes that are easy to avoid, such as these:

Error 6: only relying on the results of the testing tool

The testing tool doesn’t just help you to start the test and help you visualize the data collected, but it also provides detailed in­for­ma­tion about whether the variant has made an im­prove­ment and how much it would affect the con­ver­sion rate. In addition, a variant is declared as the winner. These tools cannot measure KPIs such as the absolute sales or returns, therefore you have to in­cor­po­rate the cor­re­spond­ing external data. If the results don’t meet your ex­pec­ta­tions, it might be worth taking a look at the separate results of your web analysis program, which usually provides a much more detailed overview of users’ behavior.

In­spect­ing in­di­vid­ual data is the only way to identify rogue values and filter them out of the overall result. The following example il­lus­trates why this can be very decisive criteria for avoiding a wrong as­sump­tion: the tool has shown that variant A is the optimal version since it achieved the best results. However, closer ex­am­i­na­tion reveals that this is down to a single user’s purchase, who happens to be a B2B customer. If you remove this purchase from the sta­tis­tics, variant B suddenly shows the best result.

The same example can be applied to the shopping cart, the order rate, or various other KPIs. In each of these cases, you will notice that extreme values can strongly influence the average value and that false con­clu­sions can quickly arise from this.

Error 7: seg­ment­ing the results too much

The detailed ver­i­fi­ca­tion of the A/B testing data in com­bi­na­tion with external data sources opens up a lot more options. It’s par­tic­u­lar­ly common to assign results to in­di­vid­u­al­ly defined user groups. This is how you can find out how users of a par­tic­u­lar age group, a par­tic­u­lar region, or a par­tic­u­lar browser have responded to the par­tic­u­lar variant. The problem is that the more segments you compare, the higher the chance of error.

For this reason, you should make sure that the chosen groups have a high relevance for your test concept and make up a rep­re­sen­ta­tive part of the overall users. For example, if you’re just examining a group of males under 30 years old, who access your site via tablet, and who only visit on weekends, you’re covering a test size that doesn’t represent the entire audience. If you plan to segment the results of an A/B test in advance, you should also set a cor­re­spond­ing­ly long test period.

Error 8: ques­tion­ing the success due to vague cal­cu­la­tions

To il­lus­trate the extent to which changing to a new variant will affect the future con­ver­sion rate, A/B tests results are often used as the basis for concrete cal­cu­la­tions. This may be an effective means for pre­sen­ta­tion purposes, but future prognoses aren’t really practical due to the different in­flu­ences involved. While the results of A/B tests only provide in­for­ma­tion about short-term changes in user behavior, long-term effects such as the impact on customer sat­is­fac­tion are not mea­sur­able within the short test period – assuming that the con­sis­ten­cy of a de­ter­mined growth is premature. In addition, there are in­flu­ences such as seasonal fluc­tu­a­tions, supply shortages, changes in the product range, changes in the customer base, or technical problems that can’t be included in A/B testing.

It’s important to keep a cool head regarding statistic problems and wrong as­sump­tions when carrying out and analyzing a website’s usability test. Making con­clu­sions too early could lead to you being dis­ap­point­ed with the sub­se­quent live results even though the optimized version of your project actually works quite well. Only when you formulate a future prognosis as well as a clean and well thought out working method when carrying out the analysis, you will be able to evaluate and interpret the A/B test results properly.

Go to Main Menu