DoorDash’s principles and processes for democratizing Machine Learning
Six months ago I joined DoorDash as their first Head of Data Science and Machine Learning. One of my first tasks was to help decide how we should organize machine learning (ML) teams in order for us to reap the maximum benefit from this wonderful technology. You can learn more about some of the current use cases of ML at DoorDash at our blog here. Having spent some time at previous technology companies and spoken to many more, I was acutely aware of many of the challenges that come up.Challenges
- ML is poorly defined: Is a linear regression in Excel ML? What about a toy random forest in a local Jupyter notebook? Where is the line between analytics and ML?
- ML needs Engineering and Science: ML at technology companies requires performant optimal decision-making.
- ML advances rapidly: Even over just the last five years we have seen modeling approaches and platforms and languages change almost every 18 months.
- ML is trendy: many people view ML as magic and so everyone wants to work on it.
Vision
Build data-driven software for advanced measurement and optimization
Principles
- Democracy: everyone can build and run an ML model given sufficient tooling and guidance.
- Talent: we want to attract and grow the best business-impact focused ML practitioners.
- Speed: if a cost-effective third party ML solution already exists then we should use it.
- Sufficiency: if a function (typically Engineering) can implement a good-enough ML solution unaided then they should do so.
- Incrementality: if a function (typically Data Science) can add enough incremental value to an ML solution then they should do so.
- Accountability: each ML solution has a single technical lead acting as the technical decision-maker.
Organization
- Reporting lines: ML Engineers report to Engineering managers and ML Data Scientists report to DS managers. ML Infrastructure reports into the central Data Platform team.
- Hiring: Job descriptions and hiring processes for ML Engineers and ML Data Scientists are reviewed and approved by ML Council.
- Technology: Strong investment in a centralized ML platform by Data Platform (workflow, provisioning, orchestration, feature stores, common data preparation, validation, quality checks, monitoring, etc.). Potential ML infrastructure technology (build/buy) decisions reviewed and approved by ML Council.
- Execution:
- Any person(s) at the company can identify a use case for ML and draft a proposal (business problem, estimated impact versus build / maintenance cost, solution, team composition, single technical lead).
- The proposal is reviewed, amended, and approved by the pod’s / vertical’s cross-functional leads (PM, EM, DS Manager, Analytics Manager, etc.). The leads should approve the business problem, prioritization, and impact / cost.
- The proposal is reviewed, amended, and approved by the ML Council.
- All steps of the review will be transparent: ML Council and ML practitioners will meet weekly at ‘ML Review’ to review items and debate next steps. Decisions will be made at this ML Review and notes will be taken and emailed to all interested folks.
ML Council
- Composition: the ML Council is composed of a group of experienced ML practitioners across the company, typically senior Engineering ML, Data Science ML, and Infrastructure ML folks. It is led by the ML Council Chair, who serves as the decision-maker for escalations. Rotates on some cadence e.g. every 12 months
- Role: the role of the ML Council is to:
- provide balance between project-specific variability vs company wide uniformity, so that we are efficient as a company
- review and give feedback on all of new ML applications
- facilitate the cross-pollination of ideas and solutions
- create better visibility into common pieces (to feed into infra)
- encourage more proactive communication of data sources and solutions.
- Responsibility: Typically the ML Council should ensure that if production performance is the biggest blocker to success then the tech lead is an ML Engineer. Otherwise if statistical performance is the biggest blocker to success then the tech lead is a Data Scientist. The ML Council should check solutions have enough support and where possible are part of the long term ML platform investment.
- Autonomy: If the ML Council disagrees on the solution / team / lead, then the ML Council Chair tie-breaks and makes a decision.
Have you hit any stumbling blocks yet?
Hi Rohan, thank you for the question. I think two areas that have been challenging are:
1. Incrementality: often times this year it has been difficult to identify a-priori whether an ML solution will add value over our MVP (non-ML) solution. In these cases we have to work carefully to try to validate quickly whether an ML implementation will have incremental impact.
2. Documentation: as much as we have strived to have ML product reviews and collect feedback, it has not been uniform or consistent. This is probably to be expected from a young company like ours that is still moving very quickly.
On the positive side we have found our ML Council to be an efficient mechanism for disseminating information and creating alignment across all the different sides of the business using ML and co-designing the centralized ML Platform. Also, our hiring has gone well this year and we have brought a strong diversity of talent.
Thank you again for your question.