In order to make Facebook as open and connected as possible for everyone, one of our goals is to understand how different populations of users join and use the service. With that objective in mind, the Facebook Data team recently sought to answer the question, “How diverse are the ethnic backgrounds of the people using Facebook?” This is a tough question to answer because, unlike information such as gender or age, Facebook does not ask users to share their ethnicity or race on their profiles. In order to answer it, we focused on a single country with a large and diverse population—the United States. Comparing people’s surnames on Facebook with data collected by the U.S. Census Bureau, we are able to estimate the racial breakdown of Facebook users over the history of the site.
We discovered that Facebook has always been diverse and that the diversity has increased significantly over the past year to the point where U.S. Facebook users nearly mirror the diversity of the overall population of the country. The graph above shows the proportion of the three largest minorities on Facebook over time as predicted by our model, while the dashed lines show the proportion of the Internet population for the same ethnicities.
In this report, we’ll discuss how we are able to measure diversity without user-supplied race or ethnicity. We’ll also explain how race and ethnicity have varied over the course of Facebook’s history and explore future research for understanding friendship diversity on the site.
The U.S. Census Bureau’s Genealogy Project publishes a data set containing the frequency of popular surnames along with a breakdown by race and ethnicity. These data are the key to our analysis, so we will spend some time describing them in some detail. An example of the raw data is shown below for the three most-frequent surnames in the census: Smith, Johnson and Williams. These data provide the rank in the population, the total count of people with the name, their proportion per 100,000 Americans, and the percent for various races: White, Black, Asian/Pacific Islander, American-Indian/Alaskan Native, two or more races and Hispanic respectively ((While there are many preferences for describing people’s race and ethnicity, we have chosen to use the terms used in the U.S. Census to be consistent with our data.)).
This data set allows us to predict what a person’s race is based solely on his or her surname. While these predictions will be often be wrong, in aggregate they will be correct. For example, suppose you select 10,000 people with the name Smith from the U.S. population at random. The data above suggest that 7,335 of them will be White, 2,222 will be Black and so on. Certain names will be more predictive of a certain race, while others will predict a wide array of ethnic backgrounds. The table below shows the top three names within the top 1,000 ordered by the percent in a given group. It shows that some ethnicities have distinctive surnames while others do not. For instance, 98.1% of individuals with the name Yoder are White while the most predictive name for American Indian / Alaskan Native individuals only has 4.4% in that group. For this reason, we will only look at White, Black, Asian/Pacific Islander and Hispanic predictions in our analysis.
|Name||Rank||Count||% in group|
|Asian / Pacific Islander|
|American Indian / Alaskan Native|
|Two or more races|
A simple technique for finding the distribution of ethnicities on Facebook is as follows: given the users who are on the site at a given time, sum the total users with each name in the Census Genealogy data. For each of these names, we estimate the total number of each ethnicity by multiplying by the numbers above. As in the previous example, if we have 10,000 Smiths on the site at one time, then we assume we have 7,335 White users, 2,222 Black users, and so on.
One potential source of error in this estimate comes from our assumption that users are selected at random from the U.S. population. What if Facebook is primarily White? Wouldn’t a majority of the Smiths be White then, breaking our assumption? In order to address this, we refine our estimates using a statistical technique known as mixture-modeling. We imagine that people come from a population with unknown racial/ethnic proportions. Individuals then get assigned names based on their race/ethnicity. Under this assumption, determining the ethnic makeup of Facebook becomes a problem of back-solving each individual’s ethnicity using only their revealed name. By allowing the Facebook population to be different from the Census population, and for each name to inform our interpretation of every other name, this technique allows us to more accurately estimate the expected number of Facebook users of a given race or ethnicity at any given time.
Finally, we adjust the estimates in our analyses with Internet adoption rates based on values from the National Telecommunications and Information Administration report on the Networked Nation. We use the percent of households with Internet access as a proxy for the addressable Internet population of each race or ethnicity.
Given the approach outlined in the methodology section, we obtain a picture of how the relative makeup of Facebook’s racial subpopulations within the United States. Because the Facebook population is changing over time, as is the ethnic diversity of addressable Internet users, we compare these groups over time. At each time step we recalibrate our model to account for the set of people on Facebook.
To illustrate this, the following plot shows how the model’s estimate of the distribution of the surname Lee has changed over time, tracking the change in Facebook’s population along with the change in our predictions of ethnicity. The dashed lines show the ethnic breakdown of people named Lee given by the Census Bureau tables described above. The disparity between the solid and dashed lines shows the possible bias when estimating race/ethnicity without the adjustment we describe in the previous section. For instance, the Census numbers would underestimate the number of Asian/Pacific Islanders on Facebook and overestimate the number of Black users on Facebook.
Looking at all users who have joined over the history of Facebook, we can examine the total population of that race on Facebook as predicted by our model at every point in time. These predictions are shown in the following chart. The chart conveys little about the diversity of Facebook since the growth of the site has affected all populations, and the U.S. population is predominantly White.
To look at the diversity of non-White users, the example shown at the top of this post shows our model prediction as a fraction of the Facebook population as well as the percent of the overall U.S. Internet population for each ethnicity. Here the solid lines show the Facebook percentage while the dashed lines show the U.S. population (in this case, we have chosen the U.S. population at the end of the time period). Because White users are a large majority, we have left them out of this plot.
Another approach to visualizing this data is to look at the relative saturation of each race. This is the fraction of users on Facebook compared to the fraction we would expect from the U.S. Internet population at that time. For instance, if Facebook had 100M users, and Asian Americans made up 4.4% of the U.S. Internet population, we would expect to find 4.4M Asian users on Facebook. If instead we observe 5M then the relative saturation would be roughly 114%.
The plot above shows Facebook saturation by ethnic and racial groups. Since 2005, Asian/Pacific Islanders have been much more likely to be on Facebook than Whites, and that has remained so. While Hispanics were once 40 percent as likely as Whites to be on the site, this number has been steadily climbing since early 2007 and currently is at 80 percent. This graph also shows that Black users are now about as likely to be on the site as White users.
In this post we have outlined an approach to determine the racial and ethnic breakdown of a population based solely on people’s surnames and data provided by the U.S. Census Bureau. We have found that while Facebook has always been diverse, this diversity has increased over time leading to a population that today looks very similar to the U.S. population.
Since completing this initial work, we have started using the first names of users to increase the precision of our predictions. While in this post we have only looked at the diversity of the population as a whole, we hope to use predictions of race and ethnicity for individuals, along with their friend connections, to understand how these populations of users are connected to each other. We are working to understand how diversity of interpersonal relationships is changing over time as more users join the site and find their friends.
The work in this post was a collaborative effort between the data scientists Lars Backstrom, Jonathan Chang, Cameron Marlow and Itamar Rosenn. This is a cross-post of the note on the Data Team Facebook Page.
21 thoughts on “How Diverse is Facebook?”
I’m sure your statistical science and reasoning are sound but, nonetheless, your approach leaves me perplexed given the U.S. history around “race” and intermarriage. In my opinion, racial classification makes no biological sense and increasingly no social sense. The US census form is an at will self-classification. You can check off as few or as many as you want. Or, as of the last time I checked, be “some other race alone.”
We’ve all grown up in a race-centered world and have assigned that designation a lot of relevance and power. I would hope that the folks at Facebook would consider their position in shaping and molding how we (humans) see each other online.
Can Facebook try to be progressive and shake that 20th century “race” taint and help take this World Wide Web in another, less divisive, direction? There’s got to be better ways to sell and market to folks.
I learned Many Thanks
n my opinion, racial classification makes no biological sense and increasingly no social sense.
I can see the potential for this to be as powerful as the Neilsen system in terms of monitoring the satisfaction of of a culturally significant sample of the population.
classification makes no biological sense and increasingly no social sense.
Facebook is diverse than other social sites.
Nice post. I learn something more challenging on different blogs everyday. Thanks for sharing.
Hello! I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.
Needed to draft you this bit of observation to help thank you very much the moment again relating to the unique opinions you’ve discussed on this page. It’s simply extremely open-handed of you to deliver without restraint all that a lot of people could possibly have advertised as an electronic book to help with making some cash for themselves, specifically seeing that you might have tried it in the event you considered necessary. These suggestions as well acted to provide a fantastic way to recognize that some people have a similar keenness just like my very own to realize great deal more with regard to this condition. I am sure there are millions of more enjoyable instances ahead for those who read carefully your blog.
I’m impressed, I need to say. Really rarely do I encounter a blog that’s both educative and entertaining, and let me let you know, you’ve hit the nail on the head. Your concept is outstanding; the issue is one thing that not sufficient people are speaking intelligently about. I’m very comfortable that I stumbled throughout this in my search for one thing relating to this.
When I originally commented I clicked the -Notify me when new comments are added- checkbox and now each time a comment is added I get 4 emails with the identical comment. Is there any means you’ll be able to remove me from that service? Thanks!
Witty! I’m bookmarking you site for future use.
Cool! Cheers for sharing that. I look forward to making use of it.
Whats up! I just wish to give an enormous thumbs up for the good data you might have right here on this post. I will probably be coming back to your blog for extra soon.
YES,I agree that.
You made a number of fine points there. I did a search on the subject and found a great number of folks will go together with with your blog.
Thanks for posting.