Conductrics Blog

New Release of Conductrics!

Posted on August 14, 2012 by conductricsblog

We are super excited to announce the release of our updated version of Conductrics! After getting feedback from both developers and digital analysts, we have given Conductrics a complete refresh. Here are a few of the major improvements:

Faster

We have ported much of the system to Node.js, ensuring that mPath is superfast and can easily scale to any sized customer’s needs.

Easier

We have rewritten the API to make it even easier to get up and running in only a few minutes. In addition, we have opened up all of the reporting via a new report API. Of course, you can still use Conductrics reporting UI get insights from your projects, but with the new report API you have full flexibility in what and how to display your data.

Smarter

Speaking of reports, we will be rolling out a set of advanced reporting tools. First up is our new Targeting Impact report. Conductrics has always maintained internal predictive models about the effectiveness of your optimization options across all targeting features and segments. Unfortunately, all of this useful information was really only accessible to the Conductrics’ internal optimization processor –where it is used it to make the best selections for each user. With the new Targeting Impact report, you can get access to this rich information via an intuitive graphical reporting tool. Now you can discover at a glance which optimization efforts work best for each type of user.

Accessible

Best of all, we will be offering offering a free developer account! We want to make using Conductrics as open and as transparent as possible so that you can evaluate it fully. Go ahead, just sign up and get access to the complete Conductrics platform which includes the following:

AB and multivariate Testing via API, with all of the statistical testing reporting you need to evaluate your tests.
Automated Optimization – great for when have time sensitive online campaigns, or if you want user targeting, let Conductrics optimize for you using its built-in machine learning methods.
Complete reporting – you will get access to all of the reporting, so that the business analytics team members can poke around and see all of the insights they can get from using Conductrics.

So go ahead and sign up and if you have any questions or suggestions just reach out to us – we would love to hear from you.

Posted in Uncategorized | Leave a comment

Balancing Earning with Learning: Bandits and Adaptive Optimization

Posted on August 17, 2012 by conductricsblog

While AB Testing is currently the default method for optimizing online conversions, there are frameworks other than AB testing for selecting between competing optimization options. One of the main alternative approaches is known as the Multi-Armed Bandits problem. Solutions to the so called bandit problems have several nice features that often make them a good choice for your optimization problems.

This post looks a little long, what am I going to learn?

You will learn how you might be able to squeeze even more from your optimization efforts by using alternatives to AB Testing. Additionally, you should have a basic handle on what the multi-armed bandit problems it and how it relates to your optimization goals.

Why Bandits?

A bandit is another name a slot machine, and online conversion optimization is in some ways analogous to the task of how to pick between slot machine with different payoffs. So think about walking into a casino, and you have reason to believe that different machines have different payoffs.

If you knew beforehand which machine had the highest payoff, you would play that one exclusively to maximize your expected return from playing. The difficulty is that you don’t know which machine is best beforehand, so you have to deduce which slot machine has the highest pay off. To do this, you are going to have to try them all out in order to get a sense of the expected return from playing each machine.

But how to do this? Run an AB style experiment?

You could do that, but remember; each time you play a poor performing machine, you lower your take that you walk out of the Casino with that night. In order to maximize how much money you walk out of the casino with, you will have to be efficient with how you collect your data.

Learn and Earn

The solutions to Bandit style problems try to account for this data acquisition cost. They do it by trying to balance using the current best guess about the performance of each option and value of collecting more data to improve upon these guesses.

This tradeoff between using currently available information and collecting more data is often referred to as explore exploit problem, but some like to call it earning while learning. You need to both learn in order to figure out what works and what doesn’t, but to earn; you take advantage of what you have learned. This is what I really like about the Bandit way of looking at the problem, it highlights that collecting data has a real cost, in terms of opportunities lost.

Unlike AB or multivariate testing, which all use basically the same methodology (apologies to Fisher, Neyman, and Jeffreys), there is no one standard approach for Bandit solutions, but they all try to incorporate this notion of balancing exploration with exploitation. Technically, these are problems that try to minimize what’s known as regret.

Regret

Regret is the difference between your actual payoff and the payoff you would have collected had you played the optimal (best) options at every opportunity.

This can get a little confusing, so to make this clearer, let’s compare AB Testing to one of the simpler bandit algorithms. While there are many different approaches to solving Bandits, one of the simplest is the value weighted selection approach, where the frequency that each optimization option is selected is based on how much it is currently estimated to be worth. So if we have two options, ‘A’ and ‘B’, and ‘A’ has been performing better, we weight the probability of selecting ‘A’ higher than ‘B’. We still randomly select between the two, but we give ‘A’ a better chance of being selected.

First lets take a look at how AB Testing selects during learning compared our Bandit method. For our example we consider a problem where we are trying to optimize with three possible options; ‘A’,’B’, and ‘C’.

In AB Testing, we see that each time we make a selection, each of the options gets the same chance to be selected.

In the Bandit approach, this isn’t the case. The probabilities for being selected at each turn depend on the current best guess off how valuable each option is. In the above case, Option A has the highest current estimated value, and so is getting selected more often than either options B or C, and likewise, option B has a higher estimated value than option C.

Since the selection probabilities are based on the current best estimates for each option, the Bandit selection probabilities can change over time as we update our value estimates for each option. Let’s imagine we set up an optimization project to try the out three possible options.

Unbeknownst to us at the start, les say that the top performing option is worth $2.00, the next best, $1.00 and the worst option only $0.50. Let’s see how Bandit’s continuous Learn and Earn balancing act looks compared to AB Testing in the graph below.

With the AB Testing approach we try out each of the three options with equal proportions until we run our test at week 5, and then select the option with the highest value. So we have effectively selected the best option 44% of the time over the 6 week period (33% of the time through week 5, and 100% at week 6), and the medium and lowest performing options 28% of time each (33% of the time through week 5, and 0% at week 6). If we the weight these selection proportions by the actual value of each option, we get an expect yield of about $1.31 per user over the 6 weeks (44%*$2 + %28*$1 + %28*$0.50=$1.31).

Now lets look at the Bandit approach. Remember, the bandit attempts to use what it knows about each option from the very beginning, so it continuously updates the probabilities that it will select each option throughout the optimization project.

In the above chart we can see that with each new week, the bandit reduces how often it selects the lower performing options and increases how often if selects the highest performing option.

So, over the 6 week period the bandit approach has selected the best option 72% of the time, the medium option 16% of the time, and the lowest performer on 12% of the time. If we weight by the average value of each option we get an expected yield of about $1.66 per user. In the case the bandit would have returned a 27% lift over the AB Testing approach – ($1.66-$1.31)/$1.31 – during the learning time of the project.

The following chart show the same idea in a slightly differently. Here we can see what the average value of each user will be during out optimization project.

For the first 5 weeks, the AB Test has an expected yield of $1.17 per user (33%*$2.00 + 33%*$1.00 + 33%*$0.50), and then an expected return of $2.00 once Option C has been selected.

The Bandit, on the other hand, starts at $1.17, like the AB Test, but then increases the average value continuously throughout the project. So you get more payoffs while learning.

In fact, the area under the bandit value curve and above the AB Testing value curve is the total extra payoff we get by using the bandit. If we averaged around 1,000 visits per week, we would take in $4,666 while learning with AB Testing but $6,810 with Bandit learning – a lift of 46%.

Of course, the longer you extend your project out for, the smaller this incremental payout is of the total. This means that for short lived projects, like online campaigns, are particularly well suited for adaptive bandit methods – since they are active for a limited time.

Why Should you Care?

There really are three main reasons to prefer Bandits over AB Testing

Earn while you Learn. Data collection is a cost, and we ought to try to do it as efficiently as possible. Using a bandit style approach lets us at least consider these costs while we run our optimization projects.
Automation – the bandit approach is a natural way to formulate the online optimization problems for automating the selection optimization with machine learning. This is especially true when applying user targeting, as correct AB tests becomes much more complicated in this situation.
A changing world – Bandit approaches are useful when you have an optimization problem where the best options can change over time. By letting the bandit method always leave some chance to select poorer performing option, you give it the opportunity to ‘reconsider’ the option effectiveness. If there is evidence that it is performing better than first thought, it will be automatically be promoted and used more often. It provides a working framework for swapping out low performing options with fresh options, in a continuous process.

Of course, there is no guarantee that using a Bandit method will perform better than running an AB Test. However, there is also no guarantee that using either a Bandit or AB Test selection approach is necessary going to be better than just guessing – but on average it should be.

To wrap up, Bandits are a continuous learning method for selecting between competing options. They weight the competing selections between what currently seems to work best and the potential benefits of collecting more information to improve future results.

If you haven’t yet – come sign up for a Conductrics account today at Conductrics

Posted in Uncategorized | 2 Comments

Intelligent Agents

Posted on August 15, 2012 by conductricsblog

If you have taken a look at the Conductrics API, or our UI (if you haven’t please do), you may have noticed that we use the term agent to describe our learning projects.

Why use an Agent? Because amazingly, Optimization, AB & Multivariate Testing, Behavioral Targeting, Attribution, Predictive Analytics, LTV … can all be recast as components of a simple, yet powerful framework borrowed from the field of Artificial Intelligence, the intelligent agent.

Of course we can’t take credit for intelligent agents. The IA approach is used as the guiding principle in Russell and Norvig’s excellent AI text Artificial Intelligence: A Modern Approach – it’s an awesome book, and I recommend anyone who wants to learn more to go get a copy or check out their online AI course.

I’m in Marketing, why should I care about any of this?

Well, personally, I have found that by thinking about analytics problems as intelligent agents, I am able to instantly see how each of the concepts listed above are related and apply them most effectively individually or in concert. Intelligent Agents are a great way to organize your analytics tool box, letting you grab the right tool at the right time. Additionally, since the conceptual focus of an agent is to figure out what action to take, the approach is goal/action rather than data collection/reporting oriented.

The Intelligent Agent

So what is an intelligent agent? You can think of an agent as being an autonomous entity, like a robot, that takes actions in an environment in order to achieve some sort of goal. If that sounds simple, it is, but don’t let that fool you into thinking that it is not very powerful.

Example: Roomba

An example of an agent is the Roomba – a robot for vacuuming floors. The Roombas environment is the room/floor it is trying to clean. It wants to clean the floor as quickly as possible. Since it doesn’t come with an internal map of your room, it needs to use sensors to observe bits of information about the room that it can use to build an internal model of the room. To do this it takes some time at first to learn the outline of the room in order to figure out the most efficient way to clean.

The Roomba learning the best path to clean a room is similar, at least conceptually, to your marketing application trying to find the best approach to convert your visitors on your site’s or app’s goals.

The Basics

Lets take a look at a basic components of the intelligent agent and its environment, and walk through the major elements.

First off, we have both the agent, on the left, and its environment, on the right hand side. You can think of the environment as where the agent ‘lives’ and goes about its business of trying to achieve its goals. The Roomba lives in your room. Your web app lives in the environment that is made up of your users.

What are Goals and Rewards?
The goals are what the agent wants to achieve, what it is striving to do. Often, agents are set up so that the goals have a value.

When the agent achieves a goal, it gets a reward based on the value of the goal. So if the goal of the agent is to increase online sales, the reward might be the value of the sale.

Given that the agent has a set of goals and allowable actions, the agent’s task is to learn what actions to take given its observations of the environment – so what it ‘sees’, ‘hears’, ‘feels’, etc. Assuming the agent is trying to maximize the total value of its goals over time, then it needs to select the action that maximizes this value, based on its observations.

So how does the agent determine how to act based on what it observes? The agent accomplishes this by taking the following basic steps:

Observe the environment to determine its current situation. You can think of this as data collection.
Refer to its internal model of the environment to select an action from the collection of allowable actions.
Take an action.
Observe of the environment again to determine its new situation. So, another round of data collection.
Evaluate the ‘goodness’ of its new situation – did it reach a goal, if not, does it seem closer or further away from reaching a goal then before it took the past action.
Update its internal model on how taking that action ‘moved’ it in the environment and if it helped it get or get closer to a goal. This is the learning step.

By repeating this process, the agent’s internal model of how the environment responses to each action continuously improves and better approximates each actions actual impact.

This is exactly how Conductrics works behind the scenes to go about optimizing your applications. The Conductrics agent ‘observes’ it world by receiving API calls from your application – so information about location, referrer etc.

In a similar vein, the Conductrics agent takes actions by returning information back to your application, with instructions about what the application should with the user.

When the user converts on one of the goals, a separate call is made back to the Conductrics server with the goal information, which is then used to update the internal models.

Over time, Conductrics learns, and applies, the best course of action for each visitor to your application.

What is neat is that even if you don’t use our Conductrics, intelligent agents are a great framework to arrange your thoughts when solving your online optimization problems.

If you want to learn more please come sign up for a Conductrics account today at Conductrics