Scenarios and RequirementsAs we dug into ML usage at DoorDash, the following key scenarios for ML emerged:
- Online models - This is the scenario where we make predictions live in production in the critical path of the user experience. In this scenario the models and frameworks need to be performant and have a low memory footprint. We also need to understand both the modeling frameworks and services frameworks, most in-depth here. Consequently, this is where the restrictions about which ML frameworks to support and how complex models will be stringent. Examples of these at DoorDash include food preparation time predictions, quoted delivery time predictions, search ranking, etc.
- Offline models - These predictions are used in production, but predictions are not done in the request/response paths. In this scenario runtime performance is secondary. Since these predictions are still used in production, we need the calculations to be persisted in the warehouse. Examples of these at DoorDash are demand predictions, supply predictions, etc.
- Exploratory models - This is where people explore hypotheses, but the model or its output are not used in production. Use cases include exploring potential production models, analysis for some identifying business opportunities, etc. We are explicitly not placing any restrictions on frameworks here.
- Standardizing ML frameworks: Given the number of ML frameworks available, for example LightGBM, XGBoost, PyTorch, Tensorflow, it is hard to develop expertise within a company for many of them. So there is a need to standardize on a minimal set of frameworks which covers the breadth of use-cases that are typically encountered at DoorDash.
- Model lifecycle: Support for end to end model life-cycle consisting of hypothesizing improvements, training the model, preserving the training scripts, offline evaluation, online shadow testing (making predictions online for the sole purpose of evaluation), A/B testing and finally shipping the model.
- Features: There are two kinds of features we use. One kind is request level features, which capture request-specific information, for example the number of items in an order, request time etc. The second kind is environmental features which capture the environment under which DoorDash is operating. For example, average wait times in a store, number of orders in the last 30 mins in a store, numbers of orders from a customer in the last 3 months, etc. Environmental features are common across all requests. We need a good way to compute and store environmental features.
Standardizing on Supported ML FrameworksThe first step towards an ML Platform was to standardize the ML frameworks which will be supported. Supporting any framework requires a deep understanding of it, both in terms of the API it provides and its quality and performance tuning. As an organization we are better off knowing a few frameworks deeply than many in a shallow fashion. This helps us run better services for ML as well as help leverage organizational knowhow. The goal was to arrive at the sweet spot where we make appropriate tradeoffs in selecting frameworks. For example, if there is some pre-trained model in some framework which is not available in currently supported frameworks and building one is going to take considerable effort, it makes sense to support a different framework. After completing an internal survey on currently used model types and how they might evolve over time, we arrived at the conclusion that we need to support one tree based model framework and one neural network based modeling framework. Also given the standardization of DoorDash's tech stack to Kotlin, we needed something that had a simple C/C++ API at the prediction time to hook up into the Kotlin-based prediction service using JNI. For tree based models we evaluated XGBoost, LightGBM, and CatBoost, measuring the quality of the model (using PR AUC) and training/prediction times on production models we already have. The accuracy of models were almost the same for use cases we had. For training, we found that LightGBM was fastest. For predictions, XGBoost was slightly faster than LightGBM but not by a huge margin. Given the fact that the set of current models were already in LightGBM, we ended up selecting LightGBM as the framework for tree based models. For neural network models, we looked at TensorFlow and PyTorch. Here again, for our use cases we did not find a significant difference in quality of the models produced between these two. PyTorch was slower to train on CPU's compared to Tensorflow, however on GPUs both had similar training speeds. For predictions, both of these had similar predictions per minute numbers. We then looked at the API set for Tensorflow and PyTorch for both training and prediction time and concluded that PyTorch gave a more coherent API set. With the launch of TorchScript C++ support in PyTorch, we had the right API set needed to build the prediction service using PyTorch.
Pillars of the ML Platform:After the ML framework decision, based on prediction scenarios and requirements, the following four pillars emerged:
- Modeling library - A python library for training/evaluating models, creating model artifacts which can be loaded by the Prediction Service, and making offline predictions.
- Model Training Pipeline - A build pipeline where models will be trained for production usage. Once a model training script is submitted into git repo, this pipeline takes care of training the model and uploading the artifacts to the Model Store. The analogy here is if the modeling library is the compiler that produces the model, then the model training pipeline is the build system.
- Features Service - To capture the environment state needed for making the predictions, we need feature computation, feature storage and feature serving. Feature computations are either historical or in real time.
- Prediction Service - This service is responsible for loading models from the model store, evaluating the model upon getting a request, fetching features from the Feature Store, generating the prediction logs, supporting shadowing and A/B testing.
Architecture of the DoorDash ML PlatformBased on the above, the architecture for the online predictions flow (with brief description of components) looks like: Feature Store - Low latency store from which Prediction Service reads common features needed for evaluating the model. Supports numerical, categorical, and embedding features. Realtime Feature Aggregator - Listens to a stream of events and aggregates them into features in realtime and stores them in the Feature Store. These are for features such as historic store wait time in the past 30 mins, recent driving speeds, etc. Historical Aggregator - This runs offline to compute features which are longer-term aggregations like 1W, 3M, etc. These calculations run offline. Results are stored in the Feature Warehouse and also uploaded to the Feature Store. Prediction Logs - This stores the predictions made from the prediction service including the features used when the prediction was made and the id of the model used to make the prediction. This is useful for debugging as well as for training data for the next model refresh. Model Training Pipeline - All the production models will be built with this pipeline. The training script must be in the repository. Only this training pipeline will have access to write models into the Model Store to generate a trace of changes going into the Model Store for security and audit. The training pipeline will eventually support auto-retraining of models periodically and auto-deploy/monitoring. This is equivalent to the CI/CD system for ML Models. Model Store - Stores the model files and metadata. Metadata identifies which model is currently active for certain predictions, defines which models are getting shadow traffic. Prediction Service - Serves predictions in production for various use cases. Given a request with request features, context (store id, consumer id, etc) and prediction name (optionally including override model id to support A/B testing), generates the prediction. We are just starting to execute on this plan, there is still a lot of work to do in building, scaling and operating this. If you are passionate about building the ML Platform which powers DoorDash, do not hesitate to reach us.
Acknowledgments: Cody Zeng, Cem Boyaci, Yixin Tang, Raghav Ramesh, Rohan Chopra, Eric Gu, Alok Gupta, Sudhir Tonse, Ying Chi, and Gary Ren