By Richard Hwang and Aamir Manasawala, Software Engineers
One of our goals at DoorDash is to surface to consumers a wide range of stores that are quickly deliverable to their given address. This process involves calculating accurate road distances for each store-consumer pair in our real-time search pipeline. Our earlier blog post about recommendations for search primarily focuses on the ranking component of search at DoorDash. This blog post describes how we architected our search system using open source technologies to help determine consumer selection.
Problem and Motivation
Calculating accurate driving distance in real time is critical to the selection that a DoorDash consumer sees. A mere straight line circle-based distance could be inaccurate and would result in very long Dasher drive times, especially when the topology of the region has unevenness due to barriers like mountains, lakes, bridges, parks, etc.
Figure 1 depicts a circle with a nine mile radius centered around an address in Southern California. If a consumer orders from a store on the edge of this circle, it will take at least half an hour (as shown in Figure 2) just for the Dasher to get from the store to the consumer.
Before we delve into the system architecture, let us define some terms:
- Latitude, Longitude: A unique location point on the planet, abbreviated as (lat, lng.)
- Geohash: A hierarchical encoding system to subdivide space into grid like structure.
- Isochrone: A curve of equal travel time, represented as a GeoJSON. Figure 3 shows an isochrone in San Francisco depicting areas that can be reached in 10 (inner region) and 20 (outer region) minutes by walking. Figure 4 is geojson representation of an isochrone.
The following diagram describes the overall architecture to determine if a store is in the consumer’s delivery address to determine its selection.
The offline component involves an isochrone service responsible for computing isochrones for a given location (lat and lng, which is converted to a level seven geohash) and parameters (eg: travel time).
To compute isochrones, we use our custom fork of Galton, an open source project. Galton is built on top of OSRM, an open source routing engine, and concaveman, a fast implementation of a concave hull algorithm. Galton first generates a grid of coordinates of configurable size and granularity around the input coordinate. OSRM then computes travel times from the input coordinate to each of the grid coordinates. Grid coordinates with travel times greater than the input travel time are filtered out. Finally, the concave hull algorithm generates an outline of the remaining coordinates, producing the appropriate isochrone as shown in Figure 6, which is the nine mile isochrone for the same address in Figure 1.The service caches isochrones in DynamoDB, as simple key-value lookups for speedy retrieval. Further, we key by geohash, precision 7, rather than exact coordinate, to reduce the number of entries we need to store. Precision 7 geohashes have an error of ±0.076 km; isochrones for coordinates within these bounds will not vary drastically. We store isochrones in order of millions and with lookup time under ten milliseconds.
For each request, the service queries for DynamoDB: if the isochrone is present then it is returned. If the isochrone is absent then an asynchronous job is launched to generate and store it, returning a null response. On subsequent requests for that address and parameters, the generated isochrone will be returned. When we launch a new market, we bootstrap it by running a script to pre populate isochrone entries for all geohashes in the market, to get the market up to speed for accurate selection upfront.
- DoorDash clients call the search backend API for the specific (lat, lng)
- Search module calls isochrone service with the (lat, lng) and parameters like travel time to fetch the corresponding isochrone. These parameters are district-specific and configurable, so we can run experiments for testing conversion changes based on selection. If the isochrone is absent (as in the case when the isochrone is absent in dynamodb), we fall back to the naive straight line distance computations (with tighter radius). We persist the selection logic (isochrone or straight line along with parameters) at a session level in the backend search module to provide a consistent notion of selection across browsing sessions for the consumer.
- The search module on fetching the isochrone for that address is encoded as a polygon geoshape to construct a geoshape query to hit Elasticsearch.
- Stores that are indexed into Elasticsearch have the store location encoded in geo-point format. Elasticsearch builds a prefix tree structure at index time to support fast geo queries at runtime. Elasticsearch runs the given ES geoshape query from Step 3 to compute an intersection of the polygon with stores in the index for retrieval.
- Store results are deserialized and returned to the client for that address.
Our current implementation accounts for the topology of the region via driving distance addressing the inaccurate selection problem in Figure 1 by isochrone selection as shown in Figure 6. Furthermore, this architecture allows flexibility to configure and control selection logic based on regionality, to dynamically change selection logic based on supply/demand curves, and to run selection experiments.
Some potential areas that we will be working on in the future include getting more accurate real-time traffic and road condition updates into the system.
If you are passionate about solving challenging problems in this space, we are hiring for our Search & Relevance team and our Data Infrastructure team. If you are interested in working on any other areas at DoorDash check out our careers page. Let us know for any comments or suggestions.