Why Good Forecasts Treat Human Input as Part of the Model

February 24, 2021 10 Minute Read Machine Learning 19

Brian Seo

Brian Seo is a Data scientist at DoorDash, Since February of 2019, where he primarily focuses on Demand Forecasting. Brian has a BA from the University of California Berkeley

Preprocessing the data

Preprocessing involves smoothing out and removing all the irregularities from the data so that the ML model can infer the right patterns. For example, let’s say we have a sustained marketing campaign which results in weekly order numbers growing 10% for the next few weeks. If we were expecting seasonal growth to be 5% then we should attribute this extra growth to the marketing campaign and adjust future weeks’ demand to be lower as the marketing campaign loses steam.

Smoothing can be done by using algorithms to automatically detect and transform outliers into more normal values or by manually replacing the bumps in our training data with values derived with some simple method. Using an algorithm to smooth out the data can work really well, but it is not the only way to preprocess the data. There are often some elements of the series that cannot be smoothed well that need to be handled with manual adjustments. These can include one-time spikes due to abnormal weather conditions, one-off promotions, or a sustained marketing campaign that is indistinguishable from organic growth.

How replacing outliers manually in training data can lead to better accuracy

At DoorDash we’ve learned that holidays and other events highly influence the eating habits of our consumers. Making sure that we’re building a forecast off of the correct underlying trends is critical to creating an accurate demand forecast. An important part of building a predictive model is creating a training set that reduces the variance and noise of everyday life, thereby making it easier for the model to detect trends in the data. Figure 1, below, shows a typical time series (a sequence of data points ordered by time) at DoorDash.

Timeseries of deliveries by day in a random market — Figure 1: The time series shows a sinusoidal pattern but with a huge dip in demand for period 10, where we would expect a peak.

Our data looks somewhat sinusoidal, a wave-like shape defined by periodic intervals that alternate between a peak and a trough, with the peaks and the troughs of the wave series being somewhat irregular. There is a huge dip in the middle of the series that doesn’t match the pattern at all. That dip is typical of a holiday where demand dips, but is an outlier compared to normal ordering behavior.

Here are a few different preprocessing methods we might use before training our model:

Training the model as-is with no pre-processing:
- We will be training the model on the raw data using the ETS algorithm, which has the exponential smoothing algorithm built into the forecast.
- These results will be our baseline so we can see how additional preprocessing adds accuracy.
Use another smoothing algorithm on top of what’s built into the ETS model:
- In this instance we are using the Kalman filter from tsmoothie, an algorithm from the package that helps smooth time series data. We chose this algorithm specifically because it takes in parameters that help adjust for seasonality and trends.
Make some manual adjustments to the training set:
- This method uses human intuition to recognize that for every dip, there was a broad change to consumer behavior. These changes are caused by a combination of holidays and promotions that are not random variance. For the current example, the best solution is to replace the outlier week with the most recent week prior to the dip as a proxy for what should have happened.
Use both manual adjustments and a smoothing algorithm:
- This final method uses the manual adjustment approach and smooths the result afterwards.

Effects of different smoothing methods on a timeseries — Figure 2: All three approaches yielded slightly different results relative to the raw data. The Kalman filter on it’s own, however, did the poorest job of smoothing out the outlier period.

All three of the methods made noticeable changes compared to the shape of the raw data, as shown in Figure 2, above. The Kalman filter couldn’t quite approximate the sinusoidal shape in the extreme dip but it did smooth the curve out a bit. The manual adjustments did a great job of approximating the shape as well, but didn’t flatten out that spike on day 32. The manual adjustments combined with the Kalman filter exhibit slightly different behavior in peaks and troughs and exhibit a more normal sine wave pattern overall.

The results

After preprocessing the training data and then training a model, we can see that each method yielded a noticeably different prediction.

Forecasted deliveries by pre processing method — Figure 3: While the Kalman filter on its own approximates the shape of the actual data best, it doesn’t get the magnitude correct on the test series.

Method	Error Reduction Relative to No Preprocessing (Higher is Better)
Manually Adjusted	19%
Kalman Filter	15%
Kalman Filter + Manually Adjusted	35%

Figure 4: The combination of both the Kalman filter and the manual adjustments yielded a significant decrease in the error relative to either method on its own.

We see that both the Kalman filter and the manually adjusted method yielded improvements over the untreated data. But in this case the combination of the manual adjustments and the Kalman filter ended up yielding an even better result than either method individually. This example shows that by employing a couple of simple pre-processing methods, we can get increased accuracy in our models.

Adjusting the predictions of the forecast to improve accuracy

Making adjustments to our predictions involves the integration of smaller individual modules that take into account a variety of factors that swing DoorDash’s order volume, with some of the biggest factors being weather and holidays. While we can try and create models that address things like weather or holidays, there will always be trend-breaking events that can’t be modeled. Turning to human input can improve the model’s accuracy at this stage.

Why not just build a better model with more inputs?

Building a bigger and better model ends up not being practical because it’s not really possible to account for every use case. For example, in September of 2019 we gave away one million Big Macs. DoorDash at that point had been giving promotions to customers regularly, but never on that scale. There wasn’t a good way to handle this new scenario with a model that predicts the effect of promotions since the scale and scope were just completely different from our usual promotions, which are what we would have used for training data.

Growth during One Million Big Macs promo — Figure 5: The week of the One Million Big Macs promotion resulted in unprecedented growth on top of what we were already experiencing.

We initially chalked this up as a one-time event, thinking that it wasn’t necessary to design for such an infrequent promotion, but events of this nature kept happening. Just a few months later, we partnered with Chase to offer DashPasses to many of their credit card holders, and got a massive influx of traffic as a result.

In the face of a major event like this, we could only make one of three choices:

Expect that the impact of large promotional events is not so large that we cannot deliver all our orders.
Build a generic promotional model and hope the results generalize to future outliers.
Get input from the team running the relevant promotions to boost our accuracy with manual intervention.

Option one is extremely unlikely, and is not in the spirit of trying to improve a model’s accuracy, since it's to the detriment of our marketing team’s performance. Option two would give marginal results at best, and given that every major promotion can be different, finding a feature set that can generalize well to outlier events like these is unlikely. Option three often ends up being the only choice, because the business teams that set up these promotions have spent a lot of time sizing up their impact and already have a good estimate on the impact.

In practice, we begin option three by building out the infrastructure that enables a forecast to ingest adjustments that manually alter the forecast. Although the adjustments can end up being something simple, like adding an extra 100,000 deliveries to Saturday’s forecast, there can still be cascading problems to solve. For example, if we adopted the simple example above there would need to be methods or models in place to figure out how to distribute that extra 100,000 deliveries to all our geographies. But once the code is in place to rapidly propagate manual adjustments, each manual adjustment should just feel like adding another input to the model. In this way we can ask business teams their opinions on upcoming forecasts and easily incorporate those into our forecasting model.

Conclusion

Like most machine learning problems, forecasting requires a real understanding of a problem’s context before throwing computing power and algorithms at it. A simple solution, like getting input from business partners, can end up improving forecasting accuracy just as much as using a complex algorithmic solution. That is why enabling the input of experts or stakeholders at the company can help improve the forecast’s accuracy even more.

Comments

Leave a Reply Cancel reply

Related Positions

Director/Senior Director, Marketing Analytics San Francisco, CA; Seattle, WA; New York, NY; Los Angeles, CA; Chicago, IL; Austin, TX; Washington, D.C. Data Scientist / Senior Data Scientist, Analytics San Francisco, CA; Seattle, WA; New York, NY; Los Angeles, CA; Chicago, IL; Austin, TX See All Jobs

Using ML and Optimization to Solve DoorDash’s Dispatch Problem

DoorDash delivers millions of orders every day with our last-mile logistics platform. Look under the hood and learn how the platform works

Alex Weinstein

Jianzhe Luo 18 Minute Read

Machine Learning

Improving Experimental Power through Control Using Predictions as Covariate (CUPAC)

Too much varience can reducing experimental power. Learn how we solved this problem with our new CUPAC method

Jeff Li

Yixin Tang

Jared Bauman 9 Minute Read

Machine Learning

Optimizing DoorDash’s Marketing Spend with Machine Learning

Learn how we utilized cost curves to automate our marketing campaigns at scale

Aman Dhesi 12 Minute Read

Machine Learning

Predicting Marketing Performance from Early Attribution Indicators

Learn how DoorDash managed to respond faster to changes in the market by building a model to utilize early attribution data from experiments

Zhe Mai 10 Minute Read

Culture Machine Learning

Wanted: Data Scientists with Technical Brilliance AND Business Sense

DoorDash seeks data scientists who prioritize the business impacts of their work.

AlokGupta 13 Minute Read

Machine Learning

Switchback Tests and Randomized Experimentation Under Network Effects at DoorDash

To A/B or not to A/B, that is the question Overview On the Dispatch team at DoorDash, we use simulation, empirical observation, and experimentation to make progress towards our goals; however, given the systemic nature of many of our products, simple A/B tests are often ineffective due to network effects. To be able to experiment ...