This blog posts delves into the mobility data and trip hop movement patterns for one
The Silicon Valley world of top tech companies and startups often comes under fire for being too male, too white and too wealthy. For several years, the privileged group of tech workers in the Bay has continued to gentrify many San Francisco neighborhoods. Subsequently rising rents and new, high-end restaurants/shops have priced people out of the city and left many dispossessed.
This is a pressing issue in San Francisco, and I want to use tech to help address the issue. In this post, I’ll focus on understanding the interaction of the tech and non-tech communities. I want to know how tech workers in San Francisco interact with groups in their neighborhoods.
Using CITYDATA's proprietary datasets, I gathered device-specific geolocation timestamps on ~100K devices. After clustering the data, I determined a likely work location and a likely home city for about 25% of the devices. From this sample, I focused on two broad groups of San Francisco devices: tech-workers and non tech-workers.
As a proxy for tech-workers, I used a subset of devices whose work locations matched the locations of Google, Apple and Facebook offices. To find non-tech workers, I sampled devices whose inferred work locations did not match known tech offices. This left me with ~2K total devices.
Similarity Scores & Network Analysis
From the location trails of each device, CITYDATA pinpointed a list of places visited (e.g. restaurants, movie theaters, train stops, airports, etc.) and events attended (e.g. concerts, basketball games, etc.). This list led to a classification of what types of places and what types of events a particular device tends to visit.
With these vectorized classifications, I created a similarity scoring method and calculated the similarity score between all devices. To give some intuition, two devices that always visit fast food restaurants and go to bars on weekends would receive high similarity scores. On the other hand, a device that attends church regularly and a device that attends strip clubs regularly would receive low similarity scores.
From the scores, I created a series of k-NN (k-nearest neighbors) networks. In the network, two nodes (or devices) are connected by an edge. For every node, there is an edge between itself and the k most similar other nodes. Further, the length of each edge is inversely proportional to the similarity score. This means similar devices are grouped close together and dissimilar devices are farther apart.
If we set k = 1, we don’t have too many connections and we have a simple, low-dimensional graph. If k = 50, we’ll have a dense, high-dimensional graph. Below are figures for k = 1,2,3 and 5.
Figure 1: k-NN graphs of SF tech workers (red) and non-tech workers (cyan)
Taking a quick look at the graphs, a few things jump out. First, the shape of the network drastically changes from simple to complex as we increase k. Second, as we increase k, we increase the number of neighbors shown. Therefore, there are far more yellow lines, which are connections between tech and non-tech workers. In other words, for most nodes, the first 1 or 2 most similar connections do not bridge the tech vs. non-tech divide. It takes a few degrees of separation for that signal to emerge.
Third, in each network there are a few clearly defined cyan clusters. These clusters represent cliques that rarely extend past their own grouping. They are idiosyncratic features of the graph. Digging further into the clusters, I noticed that those cyan bulbs were very neighborhood centric. As seen in the detailed graph of Figure 2, the Mission in San Francisco is a major contributor to this feature.
Figure 2: Nodes and edges of non-tech workers (cyan) in the Mission
As one of the main spots of gentrification in the city, this cluster makes intuitive sense. There is a large influx of tech workers in that area, although there remains a strong Latino and minority presence in the neighborhood. In the fight to preserve the Latino character of the Mission, there seems to be a strong pocket that remains separate from the tech community. There seems to be very little assimilation between tech and non-tech workers in the area.
With this idea of assimilation in mind, I took the connectivity of the graph and calculated a weighted assimilation score between tech and non-tech groups. This value represents how much one community has behaviorally assimilated with the other. A high score means the 2 groups behave similarly, while a low score means there is a wide behavioral gap between groups. I applied the metric to 7 San Francisco neighborhoods and 3 top tech companies (Google, Apple & Facebook) to obtain the following scale of Figure 3.
Amongst the 3 major companies, Google employees had the highest assimilation score, followed by Apple and Facebook. The differences between these companies could have something to do with the personalities of people hired at each of the companies. It could have something to do with age, gender, ethnicity or a whole host of other factors.
Figure 3: Assimilation scores for various residents of San Francisco
For example, a younger and perhaps more specialized cohort of employees at Facebook may stick together with other people of similar backgrounds. This could lead to fewer inter-group interactions and more insular behavior.
As an older, larger company, workers at Google might come from a more diverse population. There is still some insular behavior, as seen in the bottom right of Figure 4. However, their employees are widely distributed across the network with a predominance of intergroup connections (yellow lines).
Figure 4: Nodes and edges of Google workers (red)
Amongst neighborhoods, the Mission and the Marina had the lowest assimilation scores, but for very different reasons. As mentioned before, there is a broad gap in the Mission between the minority groups (e.g. Latino, Black and immigrant communities) and the new wave of tech workers. In the predominantly white and wealthy Marina community, this gap is more likely a difference between finance workers (i.e. non-tech for this analysis) and tech workers.
This type of analysis absolutely requires greater fine-tuning and more complete data. I attempted to use a representative sample of devices in the Bay Area, but no doubt need to account for the bias of this sampling.
Most importantly, to actually address gentrification, I need to use historical data to make these comparisons between long-term residents and new residents. Using tech workers vs. non-tech workers as a proxy for old vs. new is limited since there are certainly additional, non-tech forces of gentrification as well.
Additionally, assimilation is just one measure of a community’s interactions. Assimilation leading to inappropriate appropriation of a culture or the erasure of a neighborhood’s culture is not desirable either. Appreciation, empathy and positive action are ultimately the necessary goals.
Moving forward, I believe a more rigorous and more comprehensive network analysis of workers in the Bay Area can help identify and address the needs of our communities. We can push specific companies and their employees to act in a more responsible and selfless manner. To help preserve and enrich the neighborhoods in San Francisco, we can implement actual solutions that mitigate the negative consequences of gentrification.