How to Read Your A/B Test Results
The Algolia A/B testing feature helps you set up an A/B test to assess whether your search strategy is successful. Reading the results and trends from your test might seems straightforward, but there are several things to bear in mind when interpreting results.
Understanding how A/B testing works
How results are computed
A/B test results are computed from Click-through and Conversion rate, based on the events you send to the Insights API. This computation can lead to different results from what you have on your business data.
Looking at your business data
As Algolia A/B testing is specific to search, the focus is on search data. It’s best to look at the A/B test results in the light of your own business data, to make your own assessment of the impact of your test. This allows you to cross the search data with your revenue one for instance, to compute your own version of conversion, or to look at custom metrics for instance.
When to interpret what
The A/B testing interface in the Algolia dashboard displays data starting from the second day of your test. Even if some numbers are showing, it doesn’t mean it’s time for you to draw conclusions on the test itself. Here’s what you should be looking at before you can interpret the A/B test results.
Is the data significant?
Before you interpret results and draw conclusions on what steps to take next, you want to make sure that the numbers are significant. The significance score helps you assess whether the results are the product of chance or an accurate representation of how your users behave.
When significance is higher than 95%, you can be confident that the results would happen in the future. You shouldn’t draw any conclusion on a result if the significance is below 95%.
Is there enough data?
The larger your sample size, the better. You should run A/B tests for at least two business cycles and wait for the end of the test to draw conclusions. Experience proves the best results happen when the traffic represented by both variants is over 100,000 searches. If your overall A/B test traffic is below this, you might want to take its results with a grain of salt.
Is the split off?
When setting your A/B test, you assign a percentage of search traffic to each variant (by default 50/50). The expected traffic split should be reflected in the displayed user count. For most A/B test configurations, the search count for each variant should match the traffic split. If there’s a noticeable discrepancy, there’s probably an issue.
For example, you could have a 50/50 split and you end up with 800,000 searches on one side and 120,000 on the other. If you see a difference higher than 20% from the expected number, the results aren’t reliable In such scenarios, you should investigate your A/B test setup to understand what’s going on. It could be due to outliers like internal IP addresses or external bots skewing the results, or a back-end implementation not sending all the data.
If you believe something’s wrong with your A/B test results, there are some checks to identify what could be the root cause.
Analytics and events implementation
A/B tests rely on Click and Conversion events, so you need to make sure you properly implemented it.
For example, you might want to check the following:
- Are you catching both Click-through and Conversion rates?
- Are there enough events on popular searches?
- Are there any errors in the Insights API Logs within the Monitoring section of the Algolia dashboard?
Sales or holidays (such as Black Friday) can have an impact on your test, with more out-of-stock items for example. If you’re seeing unexpected results, you can check whether you’ve been conducting the A/B test during a special period.
A/B test in AI Re-ranking context
If you have launched an A/B test to evaluate AI Re-Ranking’s impact, there are a few additional considerations:
Make sure to launch the A/B test through the AI Re-ranking interface. If it’s not the case, AI Re-ranking must been opted-in for the index being tested.
If using a replica, ensure Re-ranking is having an impact: you should have queries re-ranked for this index. If not, change its events source index to the primary index, for AI Re-ranking to apply.
Check whether Personalization is also enabled for the index you’re testing. The current analysis isn’t optimal when Personalization is enabled: it counts traffic towards AI Re-Ranking when it probably doesn’t kick for a given percentage of the traffic. Indeed, as soon as a Personalization user profile is detected, AI Re-Ranking is no longer have any impact.
The current AI Re-Ranking model isn’t optimized for certain use-cases like marketplaces with short-lived items for instance. If this is your use-case, it’s possible that you see A/B test results that aren’t that satisfactory.
Are you using distinct? If so, items are regrouped so AI Re-Ranking has a lesser impact.
The data is no longer significant
The significance computation involves ratios and there’s no guarantee for it to increase. After a certain point, gathering more data may affect the outcome and make it less trustworthy. Typically, seasonality effects could change the conversion rate at the last minute. Your data could be significant at some point, and no longer significant some days later.
Click-through rate is going up and Conversion rate is going down
Click-through rate going up and Conversion rate going down probably means that top search results don’t convert. Potential causes can be that the product is out-of-stock, uses the wrong picture or a misleading description, has unavailable sizes, etc. In such cases, you can use filtering to eliminate unavailable products, fix the relevance implementation to prioritize certain items, or clean up your data.
You should also look at your own business intelligence metrics to confirm/deny the results you’re seeing, if possible. If you’re looking at revenue for instance, what’s the impact of the A/B test in the end?
Click-through and Conversion rates are both going down
You can reach the point where you have been through all the possible checks to see if there’s something off with your A/B test and Insights API implementation, and nothing seems to explain the results. It’s possible that the settings you are testing aren’t a good strategy for your search in the end. You should keep iterating to find a winning strategy for your own use case. You can also reach out to your Customer Success Manager, or send an email at email@example.com, for further help.