Customers come to DoorDash to discover and order from a vast selection of their favorite stores, so it is important to be able to surface what is most relevant to them. In a previous article, Powering Search & Recommendations at DoorDash, we discussed how we built our initial personalized search and discovery experience to surface the best stores for consumers based on their personal preferences.
There we showed how we were able to increase click through rate using recommendations by 25% versus a baseline of showing the most popular restaurants. By incorporating latent information, as well as preparing a training pipeline and a gradient-boosted machine setup we use in other systems at DoorDash, we’ve been able to see an increase in click through rate by another 5% in initial email tests and are in the process of testing and rolling out these changes more broadly in email and in-app.
At DoorDash, our recommendations problem differs from the typical e-commerce recommendations problem in that consumers only see stores that are close to them geographically. (See “How we Designed Road Distances in DoorDash Search”) Because of this sparsity in the matrix from consumers to stores, we started with a knowledge-based recommender system described in the previous article instead of using an approach like collaborative filtering.
However, we do want to include the kind of latent information from consumer and store similarity. To do this, we use a technique similar to the natural language processing technique of word2vec, in our case store2vec. With word2vec, the idea is that words can be encoded in a vector space to represent semantic properties. For example, using word2vec, if we have a vector for “king” and subtract “man” and add “woman”, we would get “queen”.
Encoding stores on DoorDash in a vector space holds the promise of semantically representing properties of stores that we don’t otherwise have information about, like is the store focused on providing sweet items, or is it a trendy restaurant, or is it a vegetarian restaurant.
For example, here are the most similar stores based on store2vec distance for Smitten Ice Cream in Los Altos and Darbar in Palo Alto:
Smitten Ice Cream: Baskin Robbins, Jamba Juice, Tin Pot Creamery, Krispy Kreme Doughnuts
Darbar: Amber Dhara, Amber India, Curry Up Now, Janta Indian Cuisine, Rangoon Ruby, Shiva’s
For store2Vec, we embed stores as vectors using the word2vec (CBOW) algorithm from gensim package with the following modification.
- each store is a word in our vocabulary and
- each sentence is a list of stores viewed together in a user session.
For word context, we found a context window size of 5 to work the best. As quality constraints, we enforce minimum thresholds on number of stores in a session and number of sessions a store appears in.
This gives us vectors for every store. Then to generate vectors for a consumer, we sum the vectors for each store they ordered from in the past 6 months or 100 orders. To then determine the distance between a store and a consumer, we take the cosine distance between the store’s vector and the consumer’s vector.
To illustrate this, here we construct an example consumer with order history consisting of 4505 Burgers & BBQ and New Nagano Sushi (marked firstname.lastname@example.org in the figure). We can see that burgers and sushi restaurants are some of the closest points, but interestingly, also some Korean restaurants. The points are plotted using t-SNE and the Tensorflow embedding projector. The distances listed on the right are the cosine distance between the consumer vector and the store vector.
This store2vec distance feature is one feature we added to our training pipeline for recommendations. The training pipeline consists of the following stages.
Positive and negative example generation: We sample past orders as positive examples. We extract data based on past data so that features match what they would have been at the time before the order occurred in order to maintain the integrity of the training / testing. To generate negative examples, we use the noise contrastive approach; we randomly choose another store that the consumer could have ordered from.
Feature generation: Based on data for consumer and stores, we extract many features having to do with the annotated data on consumer and stores such as categories, rating, popularity, and browse / click / order information.
Train/test split: We split 70% training and 30% test as a time split so that we are not testing on data that occurred before data we trained on.
Model training: We train logistic regression and gradient-boosted machine (GBM) models. For GBM models, we use LightGBM. These are the same frameworks we use for many other machine learning systems at DoorDash such as prep time prediction and batching prediction.
Model evaluation: The model is predicting P(order | consumer, store) and is a binary classifier. To evaluate it for this ranking problem, we use area under curve (AUC) of the precision/recall curve. This provides an evaluation metric that does not change if the score values are inflated or deflated but the ranking remains the same. We also output business metrics to check for the models such as average delivery fee, average rating, and check for example users with order history conforming to certain patterns in order to sanity check the output models.
Developing a recommendations model including latent features is only the second major step here. Here are some areas we intend to explore in the future:
- Generating recommendations with context: Generating a list of recommendations is helpful, but being able to show sections with descriptions can give people more confidence in the recommendations and allow personalizing more of the DoorDash app
- Store2vec optimizations: There is more that can make these recommendations more powerful by enhancing store2vec. For example, we could include consumers in the same optimization process, meaning we would generated vectors for stores and consumers together instead of having a separate averaging step.
- Freshness in recommendations: Based on impression data, we could adjust recommendations for a consumer as they use the product
- New models: We have experimented with alternative models like the seq2seq deep learning models shown below, and expect to see gains in performance with integrating similar models.
Personalization holds promise for helping consumers using DoorDash to find what they want quickly and to help surface restaurants most relevant to them. By applying latent features we were able to improve our predictions. By applying our existing machine learning systems to this problem for GBMs we were able to get a large boost. Overall, we see approximately 20% increase in offline AUC and are currently testing these models in email and in-app where we see approximately 5% increase in click-through rate.
If you are passionate about solving challenging problems in this space, we are hiring for the data science and machine learning team as well as the search & relevance team. If you are interested in working on other areas at DoorDash check out our careers page.¹Although we try to optimize for increasing orders, click through rate is the primary metric for recommendations and search as it is the direct metric.