At Uber, machine learning plays a central role in improving user experiences across our apps. Given the scale and scope of our business, we often need to think creatively about how we design these systems. For instance, when developing our partner activity matrix, a new tool for personalizing driver experiences based on aggregate usage trends, we found inspiration in a biomedical technique for visualizing genomes (genomic biclustering ).
Using biclustering, we can visualize diversity in driver partner patterns by expressing each partner’s choice of times to drive: for example, one driver might prefer weekday commute hours and take a midday break; another might prefer weekend nights, leaving time for their job during the week; and yet another might prefer weekday afternoons when their children are in school. The universe of possible driving patterns is limitless, and so describing this diversity in a compact but rich way helps us improve the Uber experience with all driver partner preferences in mind, not just an average use case.
In this article, we introduce our partner activity matrix, a new tool that leverages biclustering and machine learning to better understand the diversity of driver experiences on the app to help us tailor our products for their preferences.
One way to visualize these patterns is through what we call a “partner gene expression matrix” or more specifically, a partner activity matrix. Inspired by an analogy of genomics in which gene expression is visualized by a matrix with each gene corresponding to a column and each individual to a row, the partner activity matrix is a matrix in which each column corresponds to a minute of the week and each row to a driver partner, with each entry in the matrix indicating whether the partner was online (1) or offline (0). Figure 1, below, shows a partner activity matrix for one city during one week, where online times are displayed in white and offline times are displayed in black:
Figure 1: In an example Partner Activity Matrix, each row represents one partner, and each column represents one minute of the week ordered by day of week (Monday-Sunday). The top X-axis shows day of week and the bottom hour of day, with white indicating the partner was online and black offline.
While a city’s partner activity matrix represents partner driving patterns, it lacks interpretability and is hard to directly use. To make using this tool easier, we needed a simpler way to describe these patterns and the users they represent.
One way to organize activity matrices and make them more interpretable is to use a machine learning technique called spectral biclustering. In genomics, biclustering clusters both the rows and columns of the gene expression matrix, identifying clusters of genes and individuals that behave similarly. Individuals in each cluster tend to express the same clusters of genes, and genes in each cluster tend to be expressed by the same clusters of individuals. If we shuffle the rows of the gene expression matrix so that individuals in the same cluster are adjacent, and shuffle the columns in this same manner, then the shuffled matrix has a checkerboard pattern in which the squares in the checkerboard are blocks of approximately constant value.
We use this method to simultaneously cluster rows (partners) and columns (times of week) of the partner activity matrix to identify clusters of partners with common preferences about when to drive and clusters of times of the week that are preferred simultaneously by the same partners. We use the clusters of partners to quickly understand how proposed changes to the Uber platform will affect drivers who use the platform in different ways, and to design changes that will be beneficial across the spectrum of usage. Rather than looking at the estimated impact of a change on each individual driver partner, which is a very noisy and time-consuming process, we can look at the impact on each cluster. This is our primary use for biclustering, though clusters of hours are also useful for improving the computational and statistical efficiency of some of our other algorithms by providing meaningful time aggregations that can be used instead of treating each hour separately.
Spectral biclustering tends to create more accurate clusters of driver partners than a traditional one-way clustering method like k-means because grouping times of week and measuring driver partners’ similarity on aggregate behavior over these multi-hour periods rather than individual hours tends to make the clusters more robust to noise and more interpretable. Spectral biclustering projects the high-dimensional vectors describing the hours when each partner drives to succinct low-dimensional driving patterns and then clusters drivers using these mined patterns. In contrast, k-means clusters the high-dimensional vectors directly and can be misled by irrelevant differences.
In our next section, we introduce the algorithm used to implement this powerful technique.
The spectral biclustering algorithm is based on a technique from linear algebra called singular value decomposition (SVD). This technique decomposes a matrix, A , which for us will be our partner activity expression matrix with n partners and d hours of week (we will have n > d ), into a sum of min(n,d)=d special matrices. These special matrices, indexed by k , are each written as where and are column vectors of unit length (i.e., they describe line segments with length one) with d and n components respectively, and is a non-negative number. Also, is orthogonal to (they describe perpendicular line segments) and is orthogonal to when k is different from . Summarizing this with an equation, we define our partner activity matrix as Equation 1, below:
Spectral biclustering considers k as indexing some underlying “driving pattern.” The vector , which has one component for each hour of the week, describes how often a driver following pattern k would drive in each hour. The number scales the number of hours up or down for the whole pattern, which is needed because is constrained to have length one.
Figure 2, below, illustrates driving patterns via matrix , where times with more driving are highlighted in dark grey:
Figure 2: In the matrix above, times with more driver activity are highlighted in dark grey. For example, the first pattern shows drivers driving during weekday morning hours, the second pattern in the middle of the day throughout the week, and the third pattern in evenings Thursday through Saturday.
For instance, the first pattern corresponds to driving during morning rush hour on weekdays; the third pattern corresponds to driving in the afternoon and evening Thursday through Saturday. Figure 2 also shows that we can summarize the driving patterns by combining the vectors into a larger matrix , with pattern IDs ( k ) indexing the rows and hours of week indexing the columns so that each consumes one row.
The vector , which has one component for each driver partner, describes the extent to which each driver follows driving pattern k . If has a large value for the component corresponding to a driver, then that driver drives at all of the hours specified by driving pattern k , and may also drive at hours given by other driving patterns. If has a value close to 0, that driver does not drive according to that driving pattern.
We combine the vectors together into a matrix U , with partners indexing the rows and pattern IDs indexing the columns, and each consuming one column. We call the entries in this matrix “weights” because they indicate how much of a driver’s driving hours are determined by a particular driving pattern. It is helpful to consider a single row in this matrix, which corresponds to a partner: this row vector describes the patterns followed by this driver. Figure 3 illustrates this with two examples. The hours driven by partner j mix patterns 2 and 3 from Figure 2, indicating that this partner drives in the middle of the day during the whole week, and also in the evening on Thursdays, Fridays, and Saturdays.
Figure 3: In matrix U, the hours driven by partner j combine patterns 2 and 3, and so include the middle of the day on all days of week and the evenings on Thursday, Friday, and Saturday. The hours driven by partner i are those from driving pattern 1: weekday mornings.
To summarize, SVD allows us to write the partner activity matrix A as a sum of matrices , each of which describes how a single driving pattern is expressed by driver partners across the week.
When understood in this way, indicates the importance of a driving pattern. Since and are unit vectors, each component of these vectors cannot have absolute value larger than one and the contribution of a driving pattern to the activity matrix cannot be significant unless is large.
To capitalize on this insight, spectral biclustering orders the pattern IDs k in decreasing order of so that more important driving patterns have smaller pattern IDs. We then choose a number of patterns K much smaller than the full number d created by SVD so that we include only the important patterns. This gives us an approximation to the activity matrix (Equation 2):
This is a “low rank” approximation because it approximates A , which is a rank-d matrix, by a matrix that has rank K < d .
Figure 4, below, describes this process of decomposing the activity matrix and then approximating it with a matrix that includes only the most important patterns:
Figure 4: We use SVD to calculate the decomposition and approximation of the partner activity matrix.
In Figure 4, SVD decomposes the partner activity matrix into three matrices, U ,, and . The matrix U describes which driving patterns each driver partner follows, i.e. the pattern weights. The diagonal matrix ∑ indicates the importance of each detected pattern. The patterns are stored in the matrix . By keeping the first K most important patterns (highlighted in dark grey), we obtain a low rank approximation of the original activity matrix that captures its essential signal.
Figure 4 writes Equation 1 in the first line in a more compact form, , using matrix multiplication with the matrices U and introduced above and one additional rectangular diagonal matrix , which has the values along the diagonal and zeros elsewhere. It then writes Equation 2 in a similar form where is the first K columns of , is the first K columns and rows of , and is the first K rows of .
Spectral biclustering incorporates one additional step to simplify this low-rank partner activity matrix . This additional step considers the rows of the matrix : each row corresponds to a driver partner and contains, for each important pattern, a weight indicating how much that partner drives according to that pattern. We group the driver partners together with k-means clustering so that within each group, or cluster, drivers have similar sequences of weights.
We can then shuffle the rows of so that drivers in the same group are in adjacent rows, as depicted in Figure 5, below:
Figure 5: We can visualize the reshuffling the pattern weights to place partners in the same cluster adjacent to each other. In Figure 5, above, there are five clusters, each of which follows a distinct collection of driving patterns.
We may also perform a similar k-means clustering of the columns of to produce groups of hours preferred simultaneously by the same partners. Reshuffling these columns (hours of the week) to make hours in the same cluster adjacent reveals patterns similar to Figure 5. If we wish, we may then take the Cartesian product of the shuffled and matrices along with to reconstruct a version of with shuffled rows and columns. This shuffled matrix exhibits a checkerboard structure with blocks of approximately constant entries.
Spectral biclustering is one of a larger number of SVD-based biclustering methods. For example, methods such as RoBic and Sparse-SVD extract patterns sequentially: they start by finding a checkerboard pattern using the best rank-1 SVD approximation; they then extract subsequent patterns sequentially from the residual matrix obtained by removing previously identified patterns. Thus, while spectral biclustering works well for clustering driver partners as described below, a great deal of additional innovation is possible.
Next, we describe our results from applying spectral biclustering to understand driver partners in one major city, thereby enabling us to customize and improve user experiences based on these patterns.
Examining driving patterns closely using spectral biclustering highlights a difference between weekdays and weekend activity. Figures 6 and 7, below, visualize one prominent cluster of driver partners provided by spectral biclustering. These partners drive regularly between the hours of 6:00 a.m. and 6:00 p.m. during weekdays and less regularly during the same hours on the weekend, with more hours driven on Saturday and fewer on Sunday. We call this cluster of partners “daytime drivers.” Figure 6, below, shows these partners’ rows in the partner activity matrix, where white indicates an hour when a partner was online:
Figure 6: Our daytime drivers’ partner activity matrix, which includes only those rows from the partner activity matrix corresponding to drivers in the daytime driver cluster, shows that online time, colored as white, occurs mainly from 6:00 a.m. to 6:00 p.m. on both weekdays and weekends, with fewer partners going online during weekends.
Figure 7, below, shows peaks on weekdays corresponding to morning and evening rush hour, with morning rush hour being more prominent, and smoother and smaller peaks at noon on weekends:
Figure 7: Partners in the above daytime driver cluster typically drive between 6:00 a.m. and 6:00 p.m., with more drivers online during weekdays, especially morning and evening rush hour, and fewer during weekends.
Figures 8 and 9, below, depict another prominent cluster of driver partners. These partners drive at night, typically starting around 6:00 p.m. and stopping between midnight and 3:00 a.m.:
Figure 8: On our nighttime driver partner activity matrix, online time (in white) occurs mainly from 6:00 p.m. to midnight on weekdays and extends to roughly 3:00 a.m. on Friday and Saturday nights.
Figure 9: The ratio of nighttime drivers online per hour of the day indicate that some partners drive between 6:00 p.m. and 1:00 a.m. on weekdays, extending to 2:00 a.m. on Friday and Saturday nights.
As evidenced by the figures above, our partner activity matrix makes it easy to determine driver preferences among certain cohorts. For instance, between Monday and Friday nighttime drivers start promptly at 6:00 p.m. as they drive riders home from work, exhibited by a first peak in fraction of drivers online at this time; on Saturday and Sunday, this first peak occurs 7:00 p.m., corresponding to riders going out for the evening. On Sunday through Thursday evenings, the fraction of drivers online falls off sharply after midnight as most riders have already gone home for the evening, while on Friday and Saturday nights activity does not decrease until between 2:00 a.m. and 3:00 a.m.
This contrast between weekdays and weekend driver partner activity reveals how Uber’s driver partners support riders and cities by providing convenient and safe transportation from work in the early evening and home from restaurants and bars later at night.
The partner activity matrix use case depicted above is only a snapshot of what our machine learning teams are doing to improve app efficiency, performance, and, most importantly, user experiences.
Using our partner activity matrix, spectral biclustering also reveals other prominent partner clusters:
These clusters give Uber more insight into how driver partners use the platform by offering an easy-to-understand categorization of these different kinds of usage, in turn making it easier for us to improve our products for both drivers and riders alike.
If using data science to engineer more magical Uber experiences for drivers interests you, consider applying for a role on our team!
Qing Feng and Peter Frazier are data scientists on Uber’s Marketplace Optimization Data Science team. Peter is also an Associate Professor of Operations Research and Information Engineering at Cornell University.
 Kluger, Y., Basri, R., Chang, J. T., & Gerstein, M. (2003). Spectral biclustering of microarray data: coclustering genes and conditions . Genome research, 13(4), 703-716.
 Asgarian, N., & Greiner, R. (2006). Using rank-1 biclusters to classify microarray data. Department of Computing Science , University of Alberta, Edmonton, AB, Canada.
 Lee, M., Shen, H., Huang, J. Z., & Marron, J. S. (2010). Biclustering via sparse singular value decomposition . Biometrics, 66(4), 1087-1095.