Nick Rennard
By Nick Rennard | Analytics, SEM, Video | April 5, 2017

Analyzing AB Split Tests

Hello Fellow Advertisers! I’ve done blogs in the past about AB split tests, but this one takes it to a more granular level. We’ll be reviewing a very complicated test between 4 different ads, and I’ll be explaining how I personally analyze and weight each column to determine overall performance. Enjoy!

Full Transcript:

Hello, everybody and welcome to another episode of my video blog series. I’m your host, Nick Rennard. Today, we’re going to be talking about AB split testing, but we’re going to be going a little bit deeper. We got a lot of questions from clients or whoever on how to analyze  AB split tests that we’re running because AB split testing whenever we pitch that to a client like, “Oh, yeah, we definitely want to be running that.” It’s definitely a best practice for us to be running AB split test at any given time.

The problem is I feel that where most people fall short on AB split test is actually analyzing your data correctly. We run these tests for a certain amount of time, a week, a month, five months, whatever, a quarter, it doesn’t matter. Then once we have all these data accumulated, what do we do with it? Are we looking at the right metrics? Are we sending people to the right ads in order to get the highest quality scores, best click-through rates, best conversion rates? I’ll be going through a test that I created for actually for people that we interview to become technicians with our company.

It’s one of the few tests that we’re trying out to get people to talk us through it and to see how they respond to relatively difficult questions. The idea behind this test, this isn’t a real AB split test. It’s something that I fudged the numbers a little bit. It’s meant to be an extremely difficult AB split test to determine. I’m going to give you the … Oh, I’m going to be giving you the answers essentially, but also talking you through why some of these variable are very difficult to analyze.

I will say that if you are running another AB split test that most of them are not nearly this complicated. It’s definitely the most complicated one I’ve ever seen, but I’ll be talking you through why and what you should be analyzing here.

Here’s an example, AB split tests. We’ll just start at the very top here. Date range is 3/8 to 3/14. This split test has been running for about a week. Date range is always something you want to take into consideration, especially if you’re a seasonal business. If you make all of your sales in Q4 because of the holidays, then if you’re analyzing a split test, that’s between December 1st and December 21st, that’s going to be a lot different because you’re probably going to be making a lot of sales in that period as supposed to looking at a split that was running in June.

I think that’s pretty straightforward, but even in this situation here where we’re looking from 3/8 to 3/14, that’s still a pretty small date range. I mean, it’s only one week. When you are reviewing split test, there’s two variables that you need to take into consideration for statistical significance. I can’t preach enough how important it is to have statistical significance before you make a decision.

If you don’t have a statistical significance, it does not matter what the numbers say. The cost per conversion could be $0.50 or the cost per conversion could be $1,000. It wouldn’t matter. If there isn’t statistically significant data, then you making a decision on that just based on theory saying like, “Oh, well, this one looks better.” If there isn’t statistically significant data proving that it is actually better, then it’s just a bad call.

Statistical significance, I’ve said that word 20 times now, is extremely important. This seven-day period, if you’re looking back over the last seven days at your AB split tests, maybe you’re getting enough traffic where that’s enough time to start reviewing the data. I’m not saying it isn’t statistically significant, but it is a relatively short period of time that you would want to take into consideration.

It’s actually extremely important to note here. Any one of these variables, if we’re looking at the data range or any of these columns or the ad copies themselves, none of these are going to be 100% telltale of whether this ad is doing good or bad. It’s not like we just look at one call and we’re like, “Yup, average position is better here, so we’re going to go with that one.” Everything is weighted. There are going to be variables in here that depending on how you’re talking about it or looking at it, they’re going to have more weight than others.

If you were to look at this and I told you that this ad generated you 202 clicks, that’s good to know because that’s a decent number of clicks and that gives us some insights into how statically significant this data is. That does tell us something, but then if we look at the cost per conversion, it’s $51.73, that tells us something else entirely. That tells us more on the actual performance of the ad.

If we didn’t have either of those variables here, if we only knew the cost per conversion was $51.73, I don’t have any data telling me that that number is statistically significant. That number alone is useless. If I look at the 202 clicks and you tell me that that a certain ad got 202 clicks, I’ll be like, “Okay. Great. Did it generate revenue? Did it get clicked on a lot? How are people responding to it? How is the user engagement?” There’s a lot of answering questions if you’re only looking at one variable.

It’s more about being able to take all of these variables in and at once, understand what they mean, digest them and then come up with based on those weightings and based on the metrics that are the … because you can rank the metrics in terms of highest levels of importance, but based on those metrics, you need to make a decision. A lot of the times, it is the judgment call and certain technicians or clients or whatever might make a different decision than someone else or value something different than something else. It is important to be able to take in all of these and not just tunnel vision on one variable and be like, “Yup, that’s how we determine our … That’s our only KPI.” If you only have one KPI, you’re probably doing something wrong here. That was a lot of time talking about date range, but I think those concepts are important.

The next thing to note here is that AB split test means that we’re testing one ad versus another ad. A thing to note here is that this is a test of four ads, which is a little bit, it’s got a part of a test if they mentioned that they are why we’re running four ads. That’s a great question because if we’re running four ads, we’re spreading our data more thinly across all of those. It’s going to take longer for us to generate statistical significance.

Best practice is usually to run two, maybe three ads. Google did just release a, I don’t know, it was not White Paper. They released a statement or whatever saying that the best click-through rates and best conversion rates that they’ve seen have been with campaigns that run three ads. I would still advocate two. Three is fine, but again, you’re just spreading your data a little more thinly. I think both are perfectly fine.

The advantage of three is that if someone looks up your ad multiple times, then they have more opportunities to see fresh content. There’s definitely some value in running more than two, but if you go more than three, then you’re probably flooding yourself or not flooding yourself, you’re spreading yourself too thin on your data. That’s something to note about, the four different ads that we’re running here.

Clicks, clicks is interesting here because if we’re looking at just … If these were your AB split testing, these were your company. We have these four ads running. Right off the bat, just looking … I’ll take impressions into account here too. If we’re looking at this number of clicks and impressions on each of these ads, you can see that ad, the second ad and the fourth ad, I call these A, B, C and D. Ad B and ad D have barely any clicks at all.

Now, granted these have only been running a week, if we let this run for longer, we’ll get more data, but 21 clicks or 18 clicks, that’s just not statistically significant for me. I’m not even looking at any of these other variables, just looking at clicks and impressions. I can already tell you that I’m not going to be advocating … If I were presenting this to a client, there’s no way in hell I’d be advocating that we run these two ads.

Now, these ads, hold on, these ads may be good, but I’m not going to tell you that either of these two ads are your best ad. The reason is because we don’t have statistically significant data to tell us that because you can see here, let’s jump over to cost per conversion here. You can see that the cost per conversion on these two ads is actually a lot better than these other two ads. Since we don’t have statistically significant data, my recommendation in this case just looking at the clicks and the cost per conversions would be that these ads are showing some promise, so we can let them continue to run and see if this data continues over a longer period of time.

Then once these ads have been running for a month or two or three and maybe we have more 200 or 300 or 400 clicks, a much larger than, so they’re even 1,000 clicks, then we can make a decision on these numbers. If this cost per conversion still stays the same at 1,000 clicks, then that’s when we can be confident like, “Okay. This ad is outperforming the others,” or “This ad is outperforming the others.”

Clicks and impressions, they don’t tell you anything about how much revenue the ad is generating for you, but they do tell you a lot about statistical significance. If you don’t have that, you don’t have anything. That is important.

Cost, I’m not going to really take cost into consideration here. Cost is also along the lines of clicks. I do think that … I should switch … Don’t worry of these. The cost per click is more what I’m going to be focused on. Cost is just telling us how much money we’ve thrown at it. If we spent $931, yeah, it’s like clicks where it’s just telling us how much data we have. We’ve invested $2600 into this AB split test, where we’ve only invested $151 into this. We’re definitely going to have more statistically significant data on this one assuming the cost per clicks are relatively equal.

Let’s talk about cost per click. Cost per click is very important. The reason that cost per click is very important is that cost per click is going to fluctuate. If we’re assuming that all of these ads are running in the same ad group with the same keywords and all other variables are the same and one of the ads is giving you a cost per click of 3.99 and the other one is giving you a cost per click of 7.21. Keep in mind that your cost per click gets lowered proportionate to your quality score.

If these are the same keywords, but the only difference is that one of the ads is getting a better quality score and so it’s lowering the cost per click, then we definitely want to be favoring the one that has a lower cost per click for multiple reasons. First of all, the cost per click is lower. This one’s almost half as much as that one. If we’re spending $2600, by favoring this ad with half the cost per click, we’re going to get almost twice as much clicks with our spend, which says a lot.

The other note is that if your cost per click is 7.21, that means you’re quality score is going to be lower. Over a longer period of time, what Google is telling you is that the ad copy that you have here isn’t lining up with your keywords or your landing pages. The relevance is just bad. That’s what it’s telling you. There is something to determine by the cost per click. It tells you a lot about the quality scores.

Now, if there’s a very small difference in the cost per click, let’s say one is $3 and one is $3.03, it’s not really enough for us to be like, “Oh, well, that one’s drastically better.” If you do notice a big difference like 7.21 versus 3.99, I don’t know, even 3.99 and 4.61, I’d start thinking that was fishy because that’s, I don’t know, I can’t do the percentage math off the top of my head, but you get the idea. Cost per click tells us a bit about quality scores. It tells us a bit about how much bang for our buck we’re getting and how relevant the ad copy that we wrote is. All right.

We skipped over click-through rate. Click-through rate, I’m going to tell you right now, is definitely the most important variable in this spreadsheet. That doesn’t mean that you only look at click-through rate. You still have to take the other stuff into consideration, but click-through rate is definitely the most highly weighted thing on this spreadsheet. The reason that’s true is because click-through rate is the most dominant variable within Google’s algorithm for determining quality scores.

If you favor the ad that has a higher click-through rate, if this ad has a 12.5% click-through rate and this one has a 7.26% click-through rate, this 12.5% over a longer period of time is going to get a much higher quality score. The reason that is is because Google is a corporation. They’re a business. They’re trying to make money. They reward you for writing ads that get higher click-through rates by giving you lower quality scores and the reason is is because they make more money off of ads that have higher click-through rates.

They call it relevance. The real reason that they do it is for money, but there is something to be said about an ad that has higher click-through rate. If it’s attracting people’s attention, if it’s getting them to click on it, then there is something to be said about it being better. Click-through rate, very, very important.

Again, we can’t just look at click-through rate because if we look at this, this particular example, you can see that the two ads that we already deemed as being statistically insignificant which is Ad B and Ad D. These two are statistically insignificant, but they also have the highest click-through rates. That’s part of the confusion of this question here. You wouldn’t want to favor one of these ads because we don’t have enough data yet, but the click-through rates are looking very favorable. That’s the reason why we can’t just look at click-through rate because in this case, we just don’t have the data to tell us that that’s actually true over a long period of time. All right.

Moving on. Average position, they’re actually, I will say that a lot of people focus on average position too much. We get this saying a lot of, “I always want to be in the number one position. I always want to have 100% impression share.” The truth is is that the number one ad position is most of the time not the best position to be in. Ad slots two and ad slots three are usually much more effective because they’re cheaper. We can bit less aggressively. There’s always going to be somebody that’s overbidding on keywords, so we can let them. There’s no real reason to fight them for the number one slot.

It’s much better to try and pick an ad where instead of paying 7.21 to be in the average position number one, I would much rather 3.99 to be in the average position number two. There’s really not a big difference in terms of which one actually converts better. Sometime in a lot of cases, ad position two and three will actually convert better than position number one, which is weird, but it’s true.

You want to take this with a grain of salt. Usually when I look at average position, I’m looking at it more in terms of cost per click, determining how is that affecting my ad rank because if I saw that the average position on this ad was 3.4, but the average position on this one was 1.2, that’s a really big difference there. That tells us a lot about how Google is ranking us. Going back to the quality scores, it’s telling us about the quality scores.

Average position, I would say that on this sheet, it’s probably one of the lowest of the lowest importance on this sheet. We’ll go ahead and start talking about conversion data. The conversions column is confusing. The reason it is is because if you look at this one and you’re like, “Oh, well, these two only had four conversion.” Those are terrible because this one had 18 and this one had 41. That’s 100% not true. The reason is because we have different quantities of data on all four of these ads.

This one spent $2600, so it better have 10 times more conversions than a campaign that only spent $151. The conversions themselves are going to be more dependent on how much you’ve actually spent on these, which is why conversion data itself, yes, it’s important and it’s good to take into account and if you do have two ads that have been running about the same amount, then that variable is perfectly fine to determine whether or not an ad is better than another. A much better variable than looking at the conversion themselves is to look at cost per conversion.

I almost always say that a variable is more indicative of performance, so it’s less of a vanity metric and more of a performance metric if it has a denominator in it. If you’re dividing it by something, that means that you’re comparing it to something, which usually makes a variable better. That’s the case of cost per conversion where we can look at the cost per conversion. If I tell you that, “Well, this ad gets its leads for $26, but this ad gets its leads for $65,” you’d choose the $26 ad every single time, right?

That’s why cost per conversion is great is because we don’t really have to look a lot of these other data. We can just look at this metric like click-through rate where we just look at that metric and it’s just intuitive. It tells us which one is doing better in terms of this highly weighted variable.

Now, we already said that click-through rate is the most important variable to be looking at. I would say that cost per conversion and or conversion value divided cost, these two are very similar. We’ll go over this one in a second. I would say that cost per conversion is the second most important variable.

After you determine, you can see how a lot of these are confusing because if we look at this ad, let’s look at ads A and C here. Ad A has a click-through rate of 7.26% and then the other one is 5.37. This would definitely favor ad A. Then you look at cost per conversion, it’s still favoring ad A. Then if you look at conversion value divided by cost, which we’ll go over here in a second, it’s actually favoring ad B. You can even look at the cost per click. The cost per click is even favoring ad B as well.

That’s why this question is actually quite confusing is because these two ads are saying a lot of different things and they’re going in different directions. That’s the point of this test is that it’s not supposed to have a right or wrong answer. It’s supposed to get you thinking about each one of these variables and it’s supposed to get you talking about how well you weight those because if you were just to look at this and be like, “Well, this one had 41 conversions, so it’s the best,” you’d just be flat wrong because overall, this one is actually one of the worst.

It has a really bad cost per conversion. It has a decent conversion value divided by cost. It does have the best cost per click. The click-through rate is actually the worst overall, which is our most important metric. We do have statistically significant data on it, which is really nice, but you can see how by just looking at one variable here, it doesn’t tell the whole story. That’s a little bit about cost per conversion. It’s pretty simple to analyze. It’s just lower is better, but yeah.

We’ll talk a little bit about conversion value divided by cost now. The way conversion values work in AdWords is if a lot of companies, especially larger companies will have more than one conversion point on their site. They’ll have contact forms, quote forms, phone calls, maybe chat popups, newsletter signups, blog signups. I mean, the list just goes on and on and on. What we like to do is we like to assign values to each one of those conversions.

A free trial, for example, if someone is signing up for a 90-day trial with your product, that’s going to be worth a lot more than someone who opened up the chat box and said, “Can I get your phone number?” or something like that. We would place a lot more weights on certain conversion points than we would on others, which is nice because we can assign. If we decide that a phone call is worth $100 to us, whereas a trial is worth $300 to us, then that makes it so that if we do have those different conversion values and we’re lumping all these conversions together.

If these 41 conversions are a combination of, let’s use that phone call and the free trial example, if all 41 of these are phone calls, then that’s $4100 in revenue. If all 18 of these conversions are free trials, then that’s 18 times 300, calculator, $5400 in revenue. You can see the difference there that even though it has less than half the amount of conversions, the conversion weight is a lot higher in that example.

You can see here where it also gets confusing is this one has a better … If we’re looking at ads A and C here again, the cost per conversion on ad A is a bit better than ad C. It’s about $14 better. Then you look at the conversion value divided by cost and it’s drastically favoring ad C. This ad C, while it’s generating less conversions for total spend overall, the weights of those conversions is actually much higher.

If we’re just looking at conversion data, the cost per conversion is actually very misleading and we would want to be favoring conversion values here because it’s generating more qualified leads for us.

I think I talked about pretty much everything here. That should give you some good insights into … This is obviously a very complicated example. I would say that in most AB split tests that I do that the two variables that I look at are click-through rate and then cost per conversion. If those are lining up, then it’s a pretty easy decision. Let’s say that those are pretty close. We definitely want to take conversion value into consideration along with cost per conversion.

Pass that, I guess the thing that I forgot, I’d first make sure that I have statistical significance. I would have to make sure the number of clicks and impressions and spend is enough to be statistically significant. After that, I would look at click-through rate, conversion data. After that, I would probably check in on the cost per click and average position to get an idea of how good of quality scores Google is giving me on these ads. Then after you review all of those variables, you should be able to tell a pretty good story on which one is the winner and which one is the loser.

Anyways, I hope that gives you some insights into the variables that we look at when we do AB split tests. Feel free to leave any questions in the comments. Thanks for watching and I’ll see you guys in my next video blog.

background dots

In the time it takes to read this sentence, you could be on your way to a well-oiled demand generation machine. Ready for your blueprint?

yes, i want my Digital blueprint