Building DoorDash’s Product Knowledge Graph with Large Language Models

April 23, 2024 9 Minute Read Machine Learning 0

Steven Xu

Steven Xu is a Machine Learning Engineer on the New Verticals Machine Learning team where he has been working on Catalog Quality and Personalized Recommendations. In his free time, he enjoys snowboarding and playing with his three cats.

Sree Chaitanya Vadrevu

Sree Chaitanya Vadrevu is a Machine Learning Engineer on the New Verticals Machine Learning team where he has been working on Foundational ML Technologies and Recommender Systems. In his free time, he enjoys cooking and meditation.

Building an attribute extraction model

Building an in-house attribute extraction/tagging model from scratch requires a significant amount of labeled training data to reach the desired accuracy. This is often known as the cold-start problem of natural language processing, or NLP. Data collection slows model development, delays adding new items to the active catalog, and creates high operator costs.

Using LLMs to circumvent the cold-start problem

Large language models, or LLMs, are deep-learning models trained on vast amounts of data. Examples include OpenAI’s GPT-4, Google’s Bard, and Meta’s Llama. Because of their broad knowledge, LLMs can perform NLP with reasonable accuracy without requiring many, if any, labeled examples. A variety of prompts can be used to instruct LLMs to solve different NLP problems.

We will highlight here how we use LLMs to extract product attributes from unstructured SKU data, allowing us to build a high-quality retail catalog that delivers the best possible experience for users in all new verticals. In the following sections, we describe three projects in which we used LLMs to build ML products for attribute extraction.

Brand extraction

Brand is a critical product attribute used to distinguish one company’s products from all others. At DoorDash, a hierarchical knowledge graph defines a brand, including entities such as manufacturer, parent brand, and sub-brand, as shown in Figure 2.

*Figure 2: Brand taxonomy breaks brands into entities such as manufacturer, parent brand, and sub-brand*

Accurate brand tagging offers a number of downstream benefits, including increasing the reach of sponsored ads and the granularity of product affinity. Because the number of real-world brands is technically infinite, DoorDash’s brand taxonomy is never complete. As the product spectrum expands, new brands must be ingested to close any coverage gaps. Previously, brand ingestion was a reactive and purely manual process to fulfill business needs. This limited the volume of new brands that could be added, often failed to address much of the coverage gap, and led to duplicate brands, making it difficult to manage the taxonomy system.

To this end, we built an LLM-powered brand extraction pipeline that can proactively identify new brands at scale, improving both efficiency and accuracy during brand ingestion. Figure 3 shows our end-to-end brand ingestion pipeline, which follows these steps:

Unstructured product description is passed to our in-house brand classifier
SKUs that cannot be tagged confidently to one of the existing brands are passed to an LLM for brand extraction
The extraction output is passed to a second LLM, which retrieves similar brands and example item names from an internal knowledge graph to decide whether the extracted brand is a duplicate entity
The new brand enters our knowledge graph and the in-house classifier is retrained with the new annotations

*Figure 3: LLM-powered brand ingestion pipeline*

Organic product labeling

Consumers care about dietary attributes when building their carts and are more likely to engage with a product if it tailors to their personal preference. Last year, we stood up a model to label all organic grocery products. The end goal was to enable personalized discovery experiences such as showing a Fresh & Organic carousel to a consumer whose past orders showed a strong affinity towards organic products.

The end-to-end pipeline takes a waterfall approach, leveraging existing data where applicable to boost speed, accuracy, and coverage. This process can be broken down roughly into three buckets:

String matching: We find exact mention of the keyword “organic” in the product title. This approach offered the highest precision and decent coverage, but it missed cases where “organic” is misspelled / dropped or has a slightly different presentation in the data.
LLM reasoning: We leverage LLMs to determine whether a product is organic based on available product information. This information could come directly from merchants or via optical character recognition extraction from packaging photos. This approach improved coverage by addressing major challenges faced by string matching and has better than human precision.
LLM agent: LLMs conduct online searches of product information and pipe the search results to another LLM for reasoning. This approach further boosted our coverage.

Figure 4 shows the LLM-powered pipeline for tagging our catalog SKUs with organic labels.

By leveraging LLMs and agents, we overcame the challenge of insufficient data and answered inferential questions via searching and reasoning using external data. Enhancing coverage of organic labels enabled us to launch item carousels that target customers’ with strong organic affinity, which improved our top-line engagement metrics.

Generalized attribute extraction

Entity resolution is the process of determining whether two SKUs refer to the same underlying product. For example, “Corona Extra Mexican Lager (12 oz x 12 ct)” sold by Safeway is the same product as “Corona Extra Mexican Lager Beer Bottles, 12 pk, 12 fl oz” sold by BevMo!. We need accurate entity resolution to build a global catalog that can reshape the way customers shop while unlocking sponsored ads.

*Figure 5: Entity resolution is the backbone of sponsored ads*

Determining whether two SKUs refer to the same underlying product is a challenging problem. It requires validating that both SKUs match all attributes exactly, which means there must be accurate extraction of all applicable attributes in the first place. Products from different categories are characterized by different sets of uniquely defining attributes. For example, an alcohol product is uniquely defined by attributes such as vintage, aging, and flavor. Starting with limited human-generated annotations, we used LLMs to build a generalized attribute extraction model.

We used LLMs and retrieval augmented generation, or RAG, to accelerate label annotations. For each unannotated SKU, we first leverage OpenAI embeddings and the approximate nearest neighbors technique to retrieve the most similar SKUs from our golden annotation set. We pass these golden annotation examples to GPT-4 as in-context examples to generate labels for the unannotated SKU. Choosing examples based on embedding similarity is advantageous over random selection because the selected examples are more likely to be relevant to the assigned task and reduces hallucination. Ultimately, the generated annotations are used to fine-tune an LLM for more scalable inference.

This approach enabled us to generate annotations within a week that would otherwise require months to collect, allowing us to focus on the actual model development to de-risk our goal.

Downstream impacts

Attribute extraction not only allows us to better represent each product in the catalog but also empowers downstream ML models that improve a customer's shopping experience. Attributes such as brand and organic tag are important features in our personalized ranking models, which recommend items that reflect a consumer’s unique needs and preferences. And attributes such as product category and size enable recommending more relevant substitutions when the original item is out of stock, giving customers a smooth fulfillment experience.

Looking into the future

So far, most of our attribute extraction models are built on top of text-based inputs. A challenge with this approach, however, is the presence of abstraction and abbreviations within written product descriptions. Fortunately, product image quality varies less across merchants. We are actively exploring recent advances in multimodal LLMs that can process text and images together; currently, we are experimenting with multimodal attribute extraction through Visual QA and Chat + OCR. Our Engineering team is also building foundational technologies and infrastructures to allow Dashers to take product photos so that we can perform attribute extraction directly on in-store items.

As we identify more areas where LLMs can be used, we are also working with our ML Platform team to democratize their use across DoorDash through a centralized model platform where anyone can easily prompt-engineer, fine-tune, and deploy LLMs.

Acknowledgments

Special thanks to Aparimeya Taneja, JJ Loh, Lexi Bernstein, Hemanth Chittanuru, Josey Hu, Carolyn Tang, Sudeep Das, Steven Gani, and Andrey Parfenov, who all worked together to make this exciting work happen!

Maintaining Machine Learning Model Accuracy Through Monitoring

Machine learning model drift occurs as data changes, but a robust monitoring system helps maintain integrity.

Swaroop Chitlur

Kornel Csernai 18 Minute Read

Machine Learning

Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging

Learn how we built a classification model quickly, cheaply, and at scale

Abhi Ramachandran 19 Minute Read

Backend Data Machine Learning

Enabling Efficient Machine Learning Model Serving by Minimizing Network Overheads with gRPC

Learn the challenges of reducing network overheads with gRPC optimizations

ArbazKhan 17 Minute Read

Data Machine Learning

Five Common Data Quality Gotchas in Machine Learning and How to Detect Them Quickly

Data preparation, represents The vast majority of work in developing machine learning models, learn how to make things easier

Kornel Csernai

Devjit Chakravarti 10 Minute Read

Machine Learning

DoorDash’s ML Platform – The Beginning

Learn how we increased the scalability and productivity of the data science team by building a machine learning platform

Param Reddy 8 Minute Read

Machine Learning

Personalized Store Feed with Vector Embeddings

Customers come to DoorDash to discover and order from a vast selection of their favorite stores, so it is important to be able to surface what is most relevant to them. In a previous article, Powering Search & Recommendations at DoorDash, we discussed how we built our initial personalized search and discovery experience to surface the ...

Mitchell Koch

Aamir Manasawala 7 Minute Read

Backend Machine Learning

Next-Generation Optimization for Dasher Dispatch at DoorDash

Learn how we optimized dasher selection using data science

Holly Jin

Josh Wien

Sifeng Lin 8 Minute Read

Machine Learning

Analyzing Switchback Experiments by Cluster Robust Standard Error to Prevent False Positive Results

Within the dispatch team of DoorDash, we are making decisions and iterations every day ranging from business strategies, products, machine learning algorithms, to optimizations. Since all these decisions are made based on experiment results, it is critical for us to have an experiment framework with rigor and velocity. Over the last few years, we have ...

Yixin Tang

Caixia Huang 10 Minute Read

Data Machine Learning

Building Riviera: A Declarative Real-Time Feature Engineering Framework

In a business with fluid dynamics between customers, drivers, and merchants, real-time data helps make crucial decisions which grow our business and delights our customers. Machine learning (ML) models play a big role in improving the experience on our platform, but models can only be as powerful as their underlying features. As a result, building ...

Allen Wang

Kunal Shah 17 Minute Read

Thank you for subscribing!

Want More
Engineering Updates?

Susbscribe to the DoorDash engineering blog

Building DoorDash’s Product Knowledge Graph with Large Language Models

Steven Xu

Recent Posts

Sree Chaitanya Vadrevu

Recent Posts

Building an attribute extraction model

Using LLMs to circumvent the cold-start problem

Brand extraction

Organic product labeling

Generalized attribute extraction

Downstream impacts

Looking into the future

Acknowledgments

Popular Posts

You May Also Like

Maintaining Machine Learning Model Accuracy Through Monitoring

Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging

Enabling Efficient Machine Learning Model Serving by Minimizing Network Overheads with gRPC

Five Common Data Quality Gotchas in Machine Learning and How to Detect Them Quickly

DoorDash’s ML Platform – The Beginning

Personalized Store Feed with Vector Embeddings

Next-Generation Optimization for Dasher Dispatch at DoorDash

Analyzing Switchback Experiments by Cluster Robust Standard Error to Prevent False Positive Results

Building Riviera: A Declarative Real-Time Feature Engineering Framework