When A/B testing is not recommended because of regulatory requirements or technical limitations to setting up a controlled experiment, we can still quickly implement a new feature and measure its effects in a data-driven way. In such cases, we use the back-door adjustment method, a type of causal inference to measure pre-post effects. This type of pre-post analysis is useful because it requires the same or less analytical effort to implement metrics tracking and make a data-driven decision as would be done in typical A/B testing. Because no test setup is required, this analysis can be used when we have to release new features quickly and as an alternative to slower testing methods. Here we explain how back-door adjustments enable non-biased pre-post analysis and how we set up these analyses at DoorDash.
Which features go live without experimentation
While data-driven experimentation ensures that the impact of new features are proven before they are presented to customers, we still want to be able to fast-track some features that address existing bugs or poor user experiences. For example, when our Global Search product team detected a critical bug in DoorDash’s mobile web platform and there was a correlated drop in key product metrics, the normal development and experimentation cycle was too slow to prevent a negative customer experience. Because we want to prioritize providing a positive customer experience, we opted to fix the issue right away. We still wanted, however, to use pre-post analysis to measure the new feature’s impact.
Typically, pre-post analysis results in huge biases because other factors could affect metrics and pre-post analysis cannot remove bias introduced by those factors. These controlling factors — including such things as seasonality, competitor moves, new marketing campaigns, and new product launches — could impact how users interact with our product in a manner similar to what we see when we introduce a feature improvement or bug fix.
Even if we can calculate the directional read of a metric lift using a simple pre-post, we can’t get the confidence level of the lift. In other words, if we were to fix this again, we don’t know the likelihood that there would be the same metric improvements. Another key advantage of back-door adjustment — as opposed to another causal analysis method called difference-in-difference — is that it does not require a parallel trends assumption. A parallel trends assumption requires that, absent any change, differences between “treatment” and “control” groups remain constant over time. In scenarios such as a bug fix, the treatment group — the group to be applied with the bug fix — does not necessarily generate a parallel metric trend because the bug existing in the treatment group already distorts the metric trend. So, we decided to measure the impact of the bug fix using a trustworthy pre-post approach that can block the back-door path from other factors that might affect metrics.
Subscribe for weekly updates
Understanding the back-door adjustment
A back-door adjustment is a causal analysis to measure the effect of one factor, treatment X, on another factor, outcome Y, by adjusting for measured confounders Z. The relationship and causal graph of treatment, outcome, and confounding variables are shown in Figure 1 below. Using this real DoorDash example in which we fixed a bug on the mobile web, we want to measure how the fix impacts success metrics — for instance the mobile web platform’s conversion rate. There can be other simultaneous factors — Z — that would also impact the success metrics, such as DoorDash’s new marketing campaigns, a new product, or other feature launches. We need to block the path of these other factors that could potentially affect metrics so that we can read only the impact of this bug fix. The back-door paths are not the causal associations of the product change to metric lifts, so by blocking them we can get a clean read with high confidence of the treatment’s impact. For context, causal association between two variables occurs when a change in one prompts a change in the other.
Adding covariates is a common and trustworthy way of blocking the back door, also known as the confounding variables. Covariates affect the outcome — in our case the “metric result” — but are not of interest in a study. In this example, we believe that special events, holidays, or other feature changes are confounding variables, but we are unable to quantify them through metrics. Nonetheless, we are confident that most of the impact of the confounding variables can be reflected by metrics changes in other platforms. So when we want to measure the impact of the bug fix on the mobile web, we can add covariates, such as the conversion rate on mobile apps and desktop platforms, during the same time period. These covariates can help us block the path of confounding variables, or Z. This causal analysis provides more accurate results than simple pre-post and it gives the confidence interval of the point estimate — the metric lift for us to make data-driven decisions.
How to implement a back-door adjustment analysis
Given the robustness of the back-door adjustment method, how do we design the experiment?
First, we need to prepare data to measure key metrics both before and after the change. Both pre- and post-data for a bug fix has to be within the same timeframe — in this case 14 days of mobile web conversion rate data before the bug fix and 14 days after the bug fix is in production. We also need to prepare data to calculate covariates. In the same example, we can use the conversion rate from the iOS, Android, and desktop platforms because these metrics would block the back door of confounding factors that impact other platforms at the same time. Because mobile apps and web platforms are impacted by the same external changes — such as a product launch on all platforms or a seasonal effect — we can use metrics on the other platforms to reduce biases.
For a typical controlled experiment, we’d set up control and treatment data for the same metrics, such as conversion rate, using an experiment tracking tool. To implement pre-post in the experiment platform, we can configure the metrics and label the pre-data as the “control” group, the post-data as the “treatment” group, and then add covariates for variance reduction and de-bias. Figure 2 below shows the implementation of a back-door adjustment for this bug fix example. The mobile web platform’s conversion rate 14 days before the bug fix is the control, the 14 days following the fix is the treatment, and the conversion rates on other platforms serve as the covariates.
Additionally, when we use back-door adjustment analysis we can read metrics impact in almost the same way we do in a controlled experiment. This metric lift is still the treatment metric value minus control metric value. We can calculate the confidence interval and p-value the same way we calculate a controlled experiment; the only difference is that, instead of measuring the difference of control versus treatment, we measure the pre- and post-difference with variance reduction.
Future improvement opportunities
Given the benefits of the back-door adjustment, why not replace all A/B tests with it? Unfortunately, there are a number of limitations to a back-door adjustment, including:
- We can’t identify all confounders. Sometimes, we don’t know what the confounding variables are or we can’t capture all major confounders.
- We can’t choose the right list of covariates and validate the impact of the chosen covariates.
There are two things that we can do to identify confounders. First, we can brainstorm potential confounding effects before measurement to make numerous strong hypotheses. This approach solves the problem because, in practical scenarios, there can be more than one back-door path and we can block more back-door paths with more confounders identified. Second, we can use advanced methods such as the instrumental variables method or the regression discontinuity design method to achieve an unbiased estimate despite being unable to block all the back-door paths. If we can find a good instrument, even when some confounding variables are unknown, we can still achieve unbiased estimates with the instrumental variables method. Similarly, for the regression discontinuity design method, if we can find a cutoff and running variable, even when we don’t know the confounding variables or only know some of them, we can obtain a high-confidence estimate.
To validate that the covariates are strong, we can leverage regression models to short-list covariates and remove disturbing signals, also known as non-confounding variables. The regression model can also validate how much variance is explained by covariates.
Now that we have worked out the limitations, we should be ready to implement any emergency features as needed and employ the back-door adjustment to measure after-the-fact.
When controlled experiments are too expensive or simply impossible, we can use the back-door adjustment with high confidence on metrics impact. The existing experimentation platform at DoorDash makes it easy to implement this approach. We plan to invest more in analytics use cases of the back-door adjustment and improve the experiment platform to easily identify high-quality covariates.
I would like to acknowledge Jessica Zhang, my manager, for supporting and mentoring me on this analysis project and reviewing drafts of this post. I would like to thank the experimentation platform engineer, Yixin Tang, for his advice on statistical theory and implementation for the back-door adjustment. I also would like to thank Jessica Lachs, Gunnard Johnson, Lokesh Bisht, and Ezra Berger for their feedback on drafts of this post. Thanks go out, too, to Akshad Viswanathan, Fahad Sheikh, Matt Heitz, Tian Wang, Sonic Wang, and Bin Li for collaborating on this project.