In our previous article, Using Display Modules to Enable Rapid Experimentation on DoorDash’s Homepage, we discussed the concept of Display Modules, and how we built them to speed up development and implement a more flexible experimentation paradigm. Although the Display Module system partially solved some of the challenges involved, we felt we could improve DoorDash’s product development velocity even more. 

The improvements we wanted to make were:

  1. Release faster, by relying more on backend releases rather than mobile releases 
  2. Removing inefficiencies in both backend and frontend codebases
  3. Fixing problems caused by our previous server-driven UI attempt

We’ll go over these problems in more detail in the next few sections.

Mobile releases take a lot of time

Mobile releases are often an intensive process for the following reasons:

  • App store reviews take time, as opposed to backend releases, which only need internal approval.
  • More rigor (and hence time) is required for testing and during the release ramp, compared to backend releases, because mobile releases cannot be rolled back instantly. The only option with a faulty release is to send out a new version that solves the previous release’s bugs.
  • Not all users upgrade their applications. There are always situations where a small percentage of users run older versions of the app. This can be tedious because subsequent backend releases will need to support, in addition to new versions, these old versions for as long as users are still using them.

Redundant implementations

We also wanted to eliminate the need for similar logic to be implemented across the stack. One issue was that we had duplicative implementations of the same business logic across multiple clients. Often, this leads to slightly different functional behaviour. With additional rigor, it is possible to ensure that parallel client behaviors remain in sync, but this rigor takes time, and doesn’t allow for fast execution. Additionally, there might be some business logic performed on the client-side that should have been done on the backend, causing errors, as the information the user saw was not based on the source of truth in the database.

The second inefficiency was that client-side components that looked very similar were difficult to reuse, meaning that even incremental changes took a lot of development time. As DoorDash went through a hypergrowth phase, the time it took to coordinate and communicate across different teams to converge on a good design (from both UI and engineering perspectives) also increased. It became harder to continuously engineer for the optimal separation of business logic and view logic, leading to many duplicative implementations of very similar views.

Iterating on our previous server-driven UI attempt 

Our previous attempt achieved a lot of our prior goals, and also brought our attention to some other potential areas of improvement.

Firstly, UI components were strongly coupled with data models. Any deviation from existing data models required changes to multiple microservices, not even counting client-side changes. This was an area where we saw an opportunity to improve our execution velocity.

Secondly, our response format was a heterogeneous array containing different components. Each of the components had certain constant envelope fields and differing content fields based on the type of the components. While in theory this was designed to be flexible, in practice we observed this was prone to deserialization errors.

Thirdly, while we were no longer using a static API response shape, and we were able to find enough flexibility to dynamically rank within component types, we wanted universal capabilities to rank content and unlock more avenues for personalization to better serve what our customers want. 

Designing generic UI components 

Our analysis of the problems listed above helped us come up with the following broad design requirements:

  • Decouple the backend responses from backend data models, and instead couple them to the views as much as possible. Removing business logic in the client reduces the overall complexity, and enables views to be naturally reusable.
  • Define new components whose implementation details would be completely defined and owned by the client teams. Moreover, we wanted new UI components to be easily shareable across different product use cases. This would reduce incremental effort if utilizing a previously defined component.

By investing in this framework, we hoped to unlock faster development speeds, allowing us to launch new features faster. As a platform team, we wanted to provide our product teams the ability to change feed experiences on the fly, which would open the door to exciting projects in the future. Our team wanted to unlock new avenues for personalization and relevance and improve the ease of experimentation.

Reviewing previous literature

We read through existing technical articles and reviewed videos about other folks implementing similar frameworks. We came across John Sundell’s talk on the subject and Spotify’s open source HubFramework which, while deprecated, was a good starting point. We also took a look at similar efforts by Instacart. We did not take a look at Airbnb’s server-driven UI, but upon doing more research while working on iterations, we were pleasantly surprised to come across articles and videos about its efforts, similar to our own solution.

Building the minimal viable product (MVP)

We wanted to build an MVP and test it out in a production environment in a not-so-critical product. This kind of testing lets us get real 360-degree feedback from customers and cross-functional teams, as well as client and backend engineers.

Designing the Facet

We designed a Facet, a building block based on the design principles described above, which is meant to map one-to-one to a view on the screen. In order to communicate view logic rather than business logic, we decided to define the UI components in terms of UI primitives (i.e. as view models, or models that bind one-to-one to various aspects of the view) instead of data models. Therefore, we created the following primitives to describe an individual view:

// The Guest of Honor.
message Facet {
  // ID of this instance of this element.
  google.protobuf.StringValue id = 1;

  // Which component should this information be rendered in the form of?
  Component component = 2;
  
  // All text fields pertaining to this element.
  Text text = 3;
  
  // All image fields pertaining to this element.
  Images images = 4;

  // Any data that doesn't fit other fields in this proto
  google.protobuf.Struct custom = 5;

  // Events related to this element, for example: clicks, selection etc.
  Events events = 6;

  // Nesting
  repeated Facet children = 7;
  
  // Only contains event data. The event name is implicit based on the placement of the "logging" field.
  // This one refers to clicks and views
  google.protobuf.Struct logging = 8;

  // Layout Data
  Layout layout = 9;

  // facet level style
  Style style = 10;

  // Component information
  message Component {
    // Maps to an Id in the component library - https://docs.google.com/document/d/1IWSggUGns5fMTUq6ysVP3ZpRB0jZqIjhpUvH90_ncZI/ 
    google.protobuf.StringValue id = 1;

    // This can in the future to define fallback component group for old(er) app versions
    google.protobuf.StringValue category = 2;
  }

  // Text fields
  message Text {
    // Means different things in the context of different components. Refer to component library for details.
    google.protobuf.StringValue title = 1;

    // Means different things in the context of different components. Refer to component library for details.
    google.protobuf.StringValue subtitle = 2;

    // Means different things in the context of different components. Refer to component library for details.
    google.protobuf.StringValue accessory = 3;

    // Means different things in the context of different components. Refer to component library for details.
    google.protobuf.StringValue description = 4;

    // Any other text
    map<string, google.protobuf.StringValue> custom = 5;
  }

  // Image fields
  message Images {
    // Means different things in the context of different components. Refer to component library for details.
    Image main = 1;

    // Means different things in the context of different components. Refer to component library for details.
    Image icon = 2;

    // Means different things in the context of different components. Refer to component library for details.
    Image background = 3;

    // Means different things in the context of different components. Refer to component library for details.
    Image accessory = 5;

    // Means different things in the context of different components. Refer to component library for details.
    map<string, Image> custom = 4;
  }

  // Image type
  message Image {
    // URI for image
    google.protobuf.StringValue uri = 1;

    // Placeholder text / string
    google.protobuf.StringValue placeholder = 2;

    // Placeholder local asset identifier (in case)
    google.protobuf.StringValue local = 4;

    // Display Style
    Style style = 3;

    // Display Style Enum
    enum Style {
      // If unset
      STYLE_UNSPECIFIED = 0;

      // Rect image w/ rounded corners
      STYLE_ROUNDED = 1;

      // Circular image
      STYLE_CIRCLE = 2;
      // Add more here as needed
    }
  }

  // Event
  message Events {
    // Click option definition
    Action click = 1;
  }

  // Action
  message Action {
    // Action name
    google.protobuf.StringValue name = 1;

    // Action - related data
    google.protobuf.Struct data = 2;
  }
}
  1. Text

Most individual views have a text hierarchy. We created a text model that semantically describes the importance of each text field. Depending on the type of view, such as row, page header, carousel, or information view, it would lay out based on the implementation of the view.

"text": {
    "title": "Frosty Bear",
    "subtitle": "Pizza, Vegetarian, DashPass",
    "description": "4.8, 14,400+ ratings, Free delivery over $12",
    "annotation": "$10 off, DashPass only"
}
  1. Images

We had a similar concept for images, where we created an image model that semantically describes importance and placement across a view.

"images": {
    "main": "https://img.cdn4dd.com/media/XXX.png",
    "background": "https://img.cdn4dd.com/media/YYY.png",
    "accessory": "dashpass-logo"
}
  1. Events

We wanted to come up with a model that semantically describes how a user interacts with the view. Currently, we only use “click”, but our design keeps the option open to introduce other types of actions, such as drag or swipe.

  1. Logging/analytics data

Previously, the clients had to do a lot of data-massaging to log attributes for view impressions and clicks. This introduced a decent amount of complexity where views might need to know a lot of business logic that might be unrelated to the view itself, such as item cells where we wanted to log its parent store, the consumer submarket ID, or the delivery fee. 

Because the clients were going to be much more data-agnostic, this logic couldn’t be gleaned anymore, so we started sending the attribute keys and values in a simple [String: String] dictionary that the clients would log with the views. This approach made client-side views cleaner because we wouldn’t have to worry about business logic in views purely for data analytics. It also allowed us to define a consistent logging workflow, i.e. always log action and impression events with the attributes defined in this field with some standard client-side additions like session ID, user ID, client version, or OS version. One minor downside is that this approach bloated the response, as we would’ve added attributes for logging to each view and resulted in needing to send a lot of repeated information in the same API response.

  1. ID

Each Facet contains an ID field which is intended to be unique within the scope of a response. The ID needs to be unique because it is used to diff the view tree for clients, and it is also used for saving the objects to a database on the Android client for caching.

  1. Custom (optional) 

We ran into some issues where the semantically described models weren’t precise enough to describe some aspects of the views, especially for certain aspects that we wanted to be customizable per view, such as utilizing the same exact view, but with a light background versus a dark one. To add customization between components, we started to utilize the Google Protobuf Struct model, which is untyped. The goal is that we shouldn’t have to use them in most cases, but the provision exists if we have a case that doesn’t fit any of the other primitives.

  1. Component

This field contains component information which informs the client on how to use the remaining information contained in the Facet.

  1. Children (optional) 

This field is an array of Facets, which provides the option of nesting these building blocks recursively.

  1. Style

We didn’t originally specify size, colors, or font, and previously used the Custom dynamically-typed object to specify these attributes when needed. Since then, we started matching styling types with our design language system library, which has helped mitigate our usage of Custom and reduce redundancies between components. 

  /// Struct that indicates styling elements, e.g. background color, font, dls style, etc.
    struct Style {
        /// Specifies configurable background color. Right now, only
        /// addressPicker utilizes this
        public var backgroundColor: ColorSemantic?
        /// Specifies the size class. Is useful for button components, currently
        public var sizeClass: SizeClass?
        /// Maps to a Prism `type`. Somewhat overlapping in specs with
        /// `backgroundColor` field, since the type will typically
        /// specify the background color, in addition to foregroundColor, etc
        public var dlsType: DLSType?
    }

Product use cases: collections, tiles, and landing pages

Now that we have a design, we need to test it out in a production environment. Conveniently, we were in the process of developing a new type of screen real estate, tile collections and custom landing pages. This was a great opportunity for us to field-test our design, because it had components that we were building from scratch, as well as components where we integrated with our previous solution. This testing would also give us a sense of how easy future migration efforts could be and was an ideal testing ground because it was not a critical feature. While there are some risks associated with coupling a launch with new technology, we were confident we could ensure they didn’t spiral out of control.

Figure 1: An instance of our collections carousel, on the left, was folded into the existing homepage design/framework. Individual collection tiles can lead to different variations of landing pages, shown on the right. This illustrates how our Facet concept can be interlaced within a predominantly non-Facet response as well as in a greenfield Facet-only response.

As shown in Figure 1, above, tile collections are actually part of the homepage. Because the homepage followed the Display Module pattern, we didn’t want to completely revamp it for an MVP. By definition, an MVP’s scope should be minimal. To conform to Display Modules on the homepage, we actually defined a new Display Module that was essentially a wrapper around the corresponding Facet for a collection.

Because the landing page was brand new, we had the ability to design it completely from scratch using Facets. When defining components, we employed a pragmatic approach: define components that are needed immediately rather than making defining the entire library a prerequisite for our work.

The business and product  teams were excited about the new screen real estate and the promise of flexibility. Mobile developers were excited about how fun and easy it was to design a component, and reduce incremental work. Although there was some initial investment effort, and more effort required when new components are added, the effort was considerably less compared to the previous solution.

From V0 to V1: adopting and expanding the MVP

Although DoorDash’s business model began with restaurant food delivery, our future lies in adding other delivery use cases, such as convenience items and groceries. In order to incorporate these new verticals in the customer experiences, we embarked on an ambitious project to re-architect the UI elements on our homepage to allow different configurations for different use cases. 

The previous API response structure that powered our homepage was geared toward displaying restaurants, and served restaurant data models, but the new designs called for a paradigm shift to a feed that split up the homepage into different categories. We knew we were making a drastic change to the backend of our homepage that would still employ mostly the same broader components as before, and we wanted to experiment with different layouts and allow for the backend to completely configure the entire page.

Model updates to support animations and interactions

Our previous proof-of-concept was designed to be a simple, linear layout for a scrollable feed of simple UI components. Our homepage is more data and interaction-heavy than other parts of our app, so we had to come up with ways to support features that weren’t needed in our MVP.

Paging: Because we don’t want to give the user all the available stores in one response, as that is not scalable, and our users wouldn’t appreciate such a huge dump of data, we needed to support pagination. To accomplish this, we added a new case to our Action enum to represent loading content. The content referred to here would potentially include offset information. In cases where paging is appropriate, we also added a key/value pair of Action to our outer response structure.

enum Action { 
    case navigate
    case loadContent(offset: String)
}

Reloading specific sections: Our MVP simply served components to be laid out sequentially, but could not group parts of the page. We ultimately decided to update our response to be structured in a way that indicated separate groupings with IDs. Some actions might specify that it was only relevant to a particular section (or set of sections).

message FacetSection {
 google.protobuf.StringValue id = 1;
 repeated Facet header = 2;
 repeated Facet body = 3;
 Layout layout = 4;
}

Bulking up the component library

The conception of our layout engine and FacetFactory, which is a class that utilizes a factory pattern to produce UI components was originally implemented for our MVP. At the start of the first version of the project for powering new verticals, we needed to make updates to our main layout engine to support the required new features, as well as develop many new UI components.

Having multiple engineers working on each of the iOS and Android apps in parallel worked better and was ultimately faster with this new paradigm because, with each new component, we did not have to design a new API response structure; we merely defined a new component, where we matched the different aspects of the new component into the text, images, and events structures. Each engineer also had greater velocity, as the data coming down was already in view model form, so it was easy and intuitive to create reusable components client-side. We did not need to write a new response model and domain model client-side with each new component, which also had the side effect of letting us worry less about deserialization issues.

Note: The only data that isn’t in view-model form is  that in custom fields, which are data dependent, and force us to write custom deserialization logic. We tried to avoid having custom objects, and when it was necessary, we would make them as lightweight as possible, which helped with velocity, and kept as many fields nullable as possible, to help mitigate deserialization issues.

Challenges, pitfalls, and learnings

As with any new system implementation at scale, we faced various challenges and learned lessons along the way. Production testing, with both less and most critical use cases, revealed design issues we needed to overcome. We also uncovered edge cases in need of solutions. Addressing these and other issues led to a more reliable and scalable production system.

Versioning of client Facet capabilities

Mobile engineers can’t deploy a new version of the app to every single device. Even with strong new version adoption there will always be a percentage of users who continue to use an older version of the app because they don’t or can’t update it. When we quickly iterate over components and features, users with outdated app versions may have a sub-optimal experience or, worse, run into bugs that prevent successful task completion.

Scenarios:

  1. One reason for outdated experiences could be that apps have older versions of native components. This situation is not specific to server-driven UI, and is a reasonable and unavoidable effect of individual users not updating their apps.
  2. Another reason could be that outdated apps will not understand new components deployed outside of a standard app update. These new components might be designed for a newer version of the app, so the app cannot map them to a view. This issue might not be so bad in itself, as omitting a new component is not too different from having older versions of the app out in the wild, which existed before those components were developed, but with server-driven UI, this leads to interesting edge cases like:
    • Container components, such as carousels or lists, where the app doesn’t understand any of the children, the actual content, which could lead to an empty container being displayed.
    • If we created a new version of an instrumental component, such as the store row in the homepage, and it was different enough that it warranted a new Facet ID, older versions of the app would omit those views as soon as the backend starts sending the newly versioned ones. 
  3. Yet another reason for broken experiences is that clients might understand the component, but a new action type might be unsupported, in which case the component will cause an error on action, but might still be rendered, ultimately resulting in a poor experience.

Our solution to minimize broken experiences for older clients

We ultimately decided on a two-pronged approach to solve the versioning problem. We designed a simple Semantic Versioning-based system that allowed clients to specify the extent of their Facet rendering capabilities (or lack thereof) while making a call to the backend. We may, in the future, map this to versions of our DLS library, but the current version maps to various feature sets, i.e. components and action sets they support.

Additionally, we added some simple validation on the client-side, so that the versioning does not get out of control:

  1. Omit container Facets if we don’t recognize any of the children
  2. Omit Facets with navigation actions that we do not understand

We found that these two simple checks reduced our need to version our libraries by about 50%.

The pitfalls of using SwiftProtobuf

We initially tried using Protobuf, our response specification, to auto-generate client-side models, and used the SwiftProtobuf library to do this. We ran into issues with seemingly safe changes, such as adding an enum case, that broke the application. Because we were only generating a handful of models, including section, Facet, text, images, and events, we ended up removing the SwiftProtobuf library as a dependency and manually implemented them.

Heavy user interaction and state maintenance

Several of our components have complex interactions, animations, and functionality that is dependent on state. Had we relied on the backend to supply the state, we might need additional API requests, which would be error-prone. It would also have led to a less responsive UI, as we would have to wait to update the state until the API responses came back.

Figure 2: Relying on the backend to supply the state for our app’s Filters Carousel would have been unnecessarily complex and likely caused slow response times.

Taking the Filters Carousel as an example, we already had a modular filters component in which the backend returns a Filter data model that determines its behavior. For example, we would generate the appropriate query parameters upon selection of a filter, based on the type of filter, such as binary, collection, or range,  and the options sent from the backend. We thought about making this query-generation backend-driven, but quickly realized that updating the selection state and scroll position, and having the backend send down the query parameters that we need in a round-trip with the API response, would be unnecessarily complex. Subsequently, we decided on a hybrid-approach for this component where the backend would actually send down the data in a custom field we could continue to use our local filters modular UI component with minimal changes.

Another example of a Facet which has state maintenance and interactions is the Address Picker view. The Address Picker updates based on the overall user location, which is stored in-memory in app sessions. If we wanted to move toward it being powered by the Facet API, there would be a delay in updating the address. There were minimal returns in having it powered by Facets, so we decided to go with the data-dependent approach.

Convenience fields are a double-edged sword without centralized usage, tracking, and review

One pitfall we kept running into was gating the usage of custom for various components. As mentioned above, in order to support further customization of views that cannot be encapsulated in text, images, and events, we utilized a dynamic object called custom. Although having a dynamic object is great for flexibility, there are downsides caused by the lack of type-safety and the potential for  deserialization errors. We have currently mitigated the deserialization errors by preventing them from failing deserialization of the entire response when running into a custom object that we don’t understand. Additionally, as we mentioned earlier, some of our components are hybrid in the sense that they are partially data-dependent. 

We realize that utilizing this custom field is a bit of an anti-pattern, and we noticed that when other engineers onboard onto this new system, they usually gravitate towards using it instead of the prototypical Facet fields, such as  text, images, and action, so we are constantly trying to minimize our reliance on custom, and finding automated ways of deterring its use except for the most dire needs.

Web support required modifications to our design 

Since we didn’t integrate with all client platforms at the same time, there were some quirks that were overlooked. We started developing this system for iOS slightly before Android, and did not adapt to our web application for several quarters. Interestingly, this system almost constrained the design for the parallel features on the web because our goal was to have the same APIs power all the clients. Although the web design was similar to mobile, recent product updates wanted to use the larger and varied screen real estate that the web affords even though layouts need to be slightly different for the web in order to support so many different, and larger, screen sizes. 

For example, the views in our store feed section of the page were meant for the store to be treated as an edge-to-edge component, and therefore we called it Store Row. However, for the web, it did not make sense for the component to be edge-to-edge, and we needed several of the views to fit in one row, like a grid, and needed to encapsulate the Store Rows in a grid-like container just for the web. We were able to find workarounds within our framework which weren’t too painful, such as updating the web-specific layout engine that lives on the client, which we feel is a testament to the robustness of our overall design. 

We also ran into scenarios where certain components needed to behave differently across platforms while maintaining the majority of their functionalities. As a fix we introduced extra custom parameters for the impacted components and updated the logic with the facet components to ignore (on mobile apps) or consider (on web) them.

One major challenge when implementing the Facets framework for web was orchestrating  Facet data through GraphQL queries. The DoorDash web app uses GraphQL to query data from the backend. GraphQL requires the client application to request data fields explicitly. The client application should have a clear understanding of data structure, actual data fields and hierarchy of them to query.  The Facet data model was designed to be recursive, ; , so there was no clear way to understand how many nodes were presented in the data or if nodes had any children prior to requesting the data. This structure makes the process of requesting Facet data via GraphQL close to impossible or at the best case scenario inefficient. 

To overcome this challenge we implemented a logic on the gateway layer to flatten the Facet response before sending it to the client app and incorporated a logic on the client side to reconstruct the original data structure from the flattened response. 
We used an efficient, fast and very straight algorithm to flatten the data. The code below presents a sneak peak of flattened data. The code shows how “carousel:filters” has 4 children and the immediate 4 nodes before it should be considered as the children of the carousel.

header:
0: {id: "filter:dashpass_eligible", childrenCount: 0,…}
1: {id: "filter:star_rating", childrenCount: 0,…}
2: {id: "filter:eta", childrenCount: 0,…}
3: {id: "filter:price_range", childrenCount: 0,…}
4: {id: "carousel:filters", childrenCount: 4,…}

Fallback components were unnecessary

We originally were very concerned about updates to important views, such as the Store Row or the standard carousel, that would cause us to version that component with a different component ID, but would have been extremely missed if omitted. Because of this fear, we have a property on Component called Category, and we created a prototypical view for each category, such as a prototypical page-header, standard-carousel, and store-row. However, we have not found this to be useful at all, as when we version components, we typically have a very specific, new design in mind, in which case, we utilize our Semantic Versioning system to know which components the clients can support before sending a response back to the client.

Results

Ultimately, the expansion of this project to create a complex server-driven system was successful in powering different verticals and supporting new features. Because it was such a huge project, it was heavily staffed with three client engineers per platform (iOS and Android). Each platform built 19 net new components, in addition to re-writing a handful of existing components on iOS, and we did not run into any issues with merge conflicts. 

The ability to work independently on the same feature was critical to the execution of a redesigned homepage, as we were rewriting the entire homepage to be modular and more flexible to non-restaurant merchants, such as grocery stores, flower shops, convenience stores, and chocolate shops.

Figure 3: The end result of our Facets concept helped us launch multiple critical product surfaces.

Additionally, choosing a homepage rewrite project to expand the system to support so many new components and new features resulted in a robust component library. This effort has led to several teams adopting our layout engine to power brand new features by reusing our view and layout engine. 

Next steps

We are currently in the process of customizing view attributes such as font, color, and padding. There was previously no way to customize these simple attributes per component, so each component’s text styling was constant across all instances of a particular component. This led to some redundancies in which some components were very similar, such as headers. We are currently in the process of making the backend aware of certain style semantics, such as text, spacing, and size  as defined by client DLS libraries. This will allow for even greater flexibility, and a reduction of redundancy, while maintaining a cohesive brand and theme.

Conclusion 

Many companies struggle with slow mobile development and adoption hurdles. Our solution allows faster iteration of products. For mobile developers, that means a larger initial investment in creating a layout engine and pragmatically creating a large component library, but once components are defined, the page can by laid out with those components by backend deploys, which ultimately results in faster, more parallelizable development, and less time building duplicative implementations. We’re constantly experimenting with new layout changes trying to find the ideal configuration as we onboard more verticals. From a business perspective, this also means a tremendous increase in ship velocity because most client-side code can now be reused, and there is less dependence on mobile release cycles.

Acknowledgement

The authors would like to thank Jeff Cosgriff, Xisheng Yao, Jimmy Zhou, Byran Yang, Kirtan Patel, Daniel Kelley, Liviu Romanascu, Erik Zhang, Bingxin Zhang, Fiona Miao, Salmaan Rizvi, Calvin Chueh, Ephraim Russo, Suke Hozumi, Ezra Berger, Wayne Cunningham, Josh Zhu, Rui Hu, Kathryn Gonzalez, Manolo Sañudo, Mauricio Barrera, Jimmy Liu, Eric Gu for contributing to this project.