DoorDash prides itself on offering an internship experience where interns fully integrate with Engineering teams and get the kind of real industry experience that is not taught in a classroom. To showcase some of our summer of 2021 intern’s experience we have put together this collection of intern projects.
- Optimizing Two Factor Authentication To Improve The User Experience – By Nir Levin
- Gradually Releasing a New Onboarding Feature With Email Bucketing – By Austin Leung
- Building an efficient lookup system to speed up DashMart receiving times – By Anna Sun
- Building for data dependency discoverability at scale – By Michael Yu
- Reducing database outages by persisting order data from PostgresDB to Amazon S3 – By Austin Kim
Optimizing Two Factor Authentication to Improve the User Experience
By Nir Levin
DoorDash uses two-factor authentication (2FA) as a means of preventing bad actors from being able to log into innocent user accounts and making purchases on their behalf. Of course, this shouldn’t be a surprise – the vast majority of companies have a consumer login which employs some form of 2FA in order to prevent these unauthorized account takeovers (ATOs) from happening. The main challenge we set out to tackle is determining the right level of risk to send out a 2FA. We need a good balance so that fraudsters are blocked but we also minimize the number of good users who experience the 2FA friction.
How 2FA is utilized
The purpose of 2FA is to allow good users to prove they are the rightful account holder and stop ATOs by fraudsters who don’t own the account they’re trying to log in to. Most commonly, ATOs occur when a bad actor gets hold of a good user’s credentials through methods such as phishing, reusing leaked credentials, or by taking over a users’ email account. In order to prevent unauthorized access, a 2FA request can be sent when the user logs in. The 2FA is typically a text message sent to the user’s device. Because the fraudster only has the user’s credentials and not their device they will not be able to log in and take over the account.
The problem with casting too large of a net
Theoretically, DoorDash could send a 2FA request every time a user logs in. While this would maximize security it also interrupts the user experience which can cause frustrated customers to stop using the platform. To minimize the use of 2FA requests we only issue them when there is a high risk of an ATO. To optimize catching fraudsters while minimizing sending the 2FA to good users, we needed to update the algorithm for deciding when to issue an 2FA.
Building the algorithm pipeline
The first step in issuing a 2FA request was to gather and analyze user metadata in the production database table using an extract transform load (ETL) job. We created rules which would use the user attributes stored in the database in order to evaluate the risk of each login being an ATO or not. These rules are part of DoorDash’s fraud risk engine. If a user’s login violated these rules, like logging in on a new device for which the ID isn’t already present in our database, the risk engine could react in real-time and issue a 2FA request. There are several login features like the device ID which are fed into the risk engine so that it can determine whether the user is trustable enough to continue without 2FA. The new algorithm introduces more features, which are accessed by the risk engine via supplementary tables built by the ETL job.
According to our experimentation, pushing out our new algorithm resulted in a 15% relative reduction in 2FA requests. In addition, there was a notable increase in users with successful logins and successful deliveries. There was also no increase in chargebacks, which would happen if fraudsters were successfully completing taking over accounts, due to our new algorithm, and placing orders.
Figuring out how to protect accounts with 2FA while still ensuring a positive user experience can be utilized for many companies looking to have a secure digital login.
Gradually Releasing a New Onboarding Feature With Email Bucketing
By Austin Leung
In order to improve DoorDash’s selection, we need a strong onboarding process. While most merchant onboarding is standard, in order to expand selection we needed to build out a new type of onboarding experience for merchants who do not use Dashers for delivery. This experience would need to be tested before completely going live to all merchants. Here we will talk about how we built a new experience with internal tooling and decided on a bucketing solution to gradually release it to merchants.
Why we had to move off of Formstack for self-delivery merchant onboarding
Previously, self-delivery merchants who wanted to complete a self-serve onboarding had to use Formstack, a third-party service similar to Google Forms. While Formstack has served us well to this point, there are major pain points we’d like to address moving forward to improve the onboarding experience:
- Security standards: Formstack is not SOC II Type 2 compliant which no longer meets our requirements.
- Loading speed: fairly slow based on negative feedback from users
- No chat support from sales team: often users get stuck and need help to proceed smoothly
- Insufficient reliability: DoorDash operators often had to fix issues which was not ideal
Overall these issues created risk that the technology would be responsible for loss of merchant signups each week, which could hurt the onboarding experience.
To provide a better user experience to merchants we decided to build this experience ourselves by leveraging the existing Self Serve Merchant Onboarding (SSMO) application. The existing SSMO already had a flow to support marketplace merchants, so this presented the opportunity to adapt it for a separate self-delivery flow.
At DoorDash it is not enough to simply build a new feature, we run experiments to prove that the new experience is better and do incremental rollout to maintain reliability. To test this feature we set up bucketing against the legacy solution. If our success metrics such as successful onboarding experiences increased and there were no issues, we could safely scale the new form to a larger share of the overall traffic.
We use bucketing because it:
- Allowed us to minimize the impact of any issues, with issues affecting only the experience of the smaller group who was redirected to SSMO
- Enabled us to rollback all traffic to Formstack immediately, if any issues did occur, mitigating the negative impact
- Can help us demonstrate the new feature is not only a net positive towards conversion rate, but has better reliability metrics than the Formstack experience
To implement this gradual release, we needed to figure out how fast we would increase traffic to the new solution and how we would power that transition.
Finding the best bucket key to split traffic
In terms of implementing the bucketing itself, DoorDash uses a library for configuring dynamic values we can pull into our code. There are many capabilities such as specifying the bucket key, the percentage for each bucket, and mapping specific percentages to individual submarkets. One of our main design decisions was identifying the bucket key among the many options.
Here were our main criteria for selecting our bucket key:
- Identifiable on each onboarding record so we could use it to redirect to the correct experience.
- Inputted by the user. In development and testing, we wanted to use the bucket key to forcefully decide what experience we would be redirected to. Our aim was to have an optional substring in the bucket key that would force the session into a certain bucket. This would provide us a stable environment instead of hoping we get bucketed into a specific experience.
- Consistent across multiple on-boardings for a merchant. Merchants oftentimes do not complete their initial onboarding, but come back later to start a new onboarding. We want to ensure that each merchant always enters the same experience that they have become accustomed to.
To solve this we considered three options for our bucket key:
- Splitting by UUID
- Splitting by location
- Splitting by email
The natural option would be to use the UUID of the merchant’s session for bucketing as we generate a UUID for each onboarding. However this violated our requirements:
- It was not consistent. Because UUIDs were identified with each session a user could come back and have a totally new experience.
- It was not easy to control the traffic. In development and testing, we often intended to enter the new SSMO experience, but would be bucketed into Formstack. Ideally, we would want to ensure UUIDs ending in a certain string of characters would be bucketed into certain experiences. However, because a session’s UUID is automatically generated instead of being inputted by the user, this was not possible.
Next, we considered bucketing by location as this was inputted as the business address on the landing page lead form. If we used the submarket of the merchant’s location as our bucket key, merchants would always have a consistent experience. However, our concern with using this bucket key was that in order to run a true A/B test, we wanted users to be split without grouping by submarket as a confounding factor.
Instead, we decided to bucket based on emails. Merchants would fill out the lead form with their email and we could then redirect them to the right experience based on that. Using email as the key satisfies all of our initial criteria as it is specified at the beginning of each onboarding, saved in the onboarding record thereafter, and is consistent for merchants who want to restart their onboarding. We could also use the email to force any user that ends their email in a certain string to be placed into a specific bucket. This way we could override the proportion of traffic that is supposed to enter each experience, and proceed with development and testing smoothly.
How this was a successful bucketing solution
We’ve been able to successfully develop and test our solution at high velocity, having built and rolled out the new self-delivery flow over the course of 12 weeks. With the easy to use email bucketing, we were able to do thorough testing even with non-engineering stakeholders. Rollout began with all traffic directed towards Formstack and has been ramped up to 10%, 25%, and 50% of merchants entering SSMO self-delivery. As we run our A/B test, we hope to continue to see increased conversion rate and reliability! We’re currently targeting a 29% relative increase in conversion rate and a 14x reduction in downtime.
During this project, we’ve learned that bucketing is best done in a highly controlled environment where you are able to control the different experiences users get. For DoorDash and other data-driven organizations, gradual rollout is a necessity to determine the impact of any new feature against success metrics. Features like self-delivery in SSMO can be adapted if they are successful, ensuring the product is constantly improving. Selecting an appropriate bucket key is an invaluable tool to achieve this, allowing us to rapidly iterate and reliably deploy without interruptions.
Building an efficient lookup system to speed up DashMart receiving times
By Anna Sun
In August 2020, DoorDash launched DashMart, a DoorDash-exclusive store that stocks everything from convenience and grocery store items to custom products, emergency items, household items, and more.
DashMart associates collect customer orders from the DoorDash marketplace and pick and pack the orders, and a Dasher comes to collect the order and deliver it to the customer. To make DashMart more efficient we needed to update the DashMart associate UI so that restocking orders that replenish our inventory could be better processed and the DashMart inventory would be updated correctly.
Initially the process of the warehouse intaking shipments for item restocking was manual and not yet automated. This made it slow and prone to human error.
To start the restocking orders intake process, associates previously had to manually search and enter an order ID, which held all the data on what was ordered which could then be added to the DashMart’s inventory.
To reduce human error we updated the UI tool so that it could search for the necessary order ID in the database automatically, rather than requiring the associate to spend time searching for the ID manually. This feature prevents confusion and human error when accepting restocking orders and adding them to the DashMart inventory.
We implemented this feature by displaying the facility order data and breaking it down by vendors, using GET APIs. Through these integrations and some frontend tweaks, operators could now use this system to input restocking order IDs and ensure items were speedily added to the inventory.
Considering that 400 to 500 operators utilize this order-receiving portal daily, this change made a huge impact and improved productivity immensely. By saving operators dozens of minutes everyday, we’re making sure that DashMart deliveries get delivered as soon as possible.
Building data dependency discoverability at scale
By Michael Yu
As DoorDash’s data infrastructure grows to support more than 20 million consumers per month across four countries, maintaining data lineage becomes more challenging. Understanding where specific data comes from, how our systems transform it, and which databases it is written to is critical to keep our logistics platform running smoothly. Addressing this need involved integrating data lineage into a new platform based on the open source Amundsen project.
Problems with discovering data dependencies at scale
Prior to building this solution, discovering upstream data producers and downstream consumers required significant manual investigation. Understanding the context behind data sources is essential for making data-driven decisions. This makes it hard for engineers and analysts to make data-driven decisions as discovering the context behind their data sources requires significant manual investigation. For example, let’s say we have a column in a table that holds the average order volume over the past 90 days. If we see an inconsistency in that metric, the process for discovering the upstream root cause might involve tracking down the ETL job writing to that table, figuring out what SQL queries were run by that ETL job, and finding the source tables of those SQL queries. This process might be repeated several times over if the root cause is not a direct upstream data source, using significant engineering resources.
Building a data discovery platform
Our new platform, which we call the Data Catalog, indexes all data processes across DoorDash to increase their discoverability. It enables users to quickly find the data sources they’re interested in and view their upstream and downstream dependencies.
The platform targets two distinct areas, dependencies across ETL jobs and across tables hosted by Snowflake, a DoorDash data infrastructure vendor. There are two data sources that we read from to catch the dependencies between ETL jobs: ETL definition code and the Apache Airflow metadata database. Getting the lineage across Snowflake tables is a complicated task as, unlike ETL jobs, there are no explicit dependencies defined. Instead, we have to look at the SQL queries running on Snowflake. Using a SQL parser that ingests raw SQL queries, we can extract the source and destination table information.
Integrating data lineage in the Data Catalog provides engineers and analysts with a unified means of retrieving all upstream and downstream dependencies of any data source they are interested in. This platform completely removes any need to trace through code, SQL queries, or logs. Ultimately, our Data Catalog paves the way for getting complete end-to-end lineage, allowing anyone to track the flow of data as it moves through dashboards, metrics, services, and ETL jobs.
Reducing database outages by persisting order data from PostgresDB to Amazon S3
By Austin Kim
Amazon Web Services (AWS) advises keeping database tables smaller than 500GB, but the database table that stores all the orders made on DoorDash consistently exceeded this limit. A short-term solution, archiving data older than 90 days, was not scalable, especially as DoorDash grows. Analyzing data usage, we found that over 80% of data came from a single JSON column. Our long-term solution was to persist that JSON column to Amazon’s Simple Storage Service (S3).
One challenge for this solution involved making sure any use cases of the JSON were now fetching data from S3 and not depending on the database. As another challenge, we needed to fetch the JSON from S3 in a way that did not freeze or add risk of failure to the workflows that process orders. Lastly, because this operation will be touching millions of orders made on DoorDash, we need to safely roll it out into production in a way that is secure and not at risk of crashing the workflows that process orders.
The first part of this solution required persisting the JSON to S3. This process begins with one of our microservices receiving a gRPC request that contains the order payload data. Our microservice then uses that payload to create an order object in the database. Typically we store the entire order payload with it, but now we send the data to S3. We then retrieve the S3 address of the file where we stored the JSON and save that link into the database so we can access it in the future. Next, we implement exception handlers and timeouts that terminate and retry the S3 request if it’s stalling for too long, making sure that a freeze in S3 will not freeze the entire workflow.
Our solution reduces the data stored in the order table by more than 80%. This reduction significantly decreases our table’s risk of a database outage and will no longer have to archive orders to resolve this issue. We added a new gRPC endpoint to provide easy access to outside microservices that need the order object and potentially related JSONs that are now stored in S3, making it more efficient and modularized for usages of the order JSON outside of merchant integrations.