Introducing Facebook Fellowships

Today I’m happy to announce that Facebook will be offering fellowships to support graduate students in the 2010-2011 school year. The program will provide tuition, stipend and other perks to lucky students whose applications are chosen. Lots more details can be found on the Facebook Fellowship page.

The areas are quite broad, and reflect the range of problems we believe are important in shaping the future of social media and web engineering:

  • Internet Economics: auction theory and algorithmic game theory relevant to online advertising auctions.
  • Cloud Computing: storage, databases, and optimization for computing in a massively distributed environment.
  • Social Computing: models, algorithms and systems around social networks, social media, social search and collaborative environments.
  • Data Mining and Machine Learning: learning algorithms, feature generation, and evaluation methods to produce effective online and offline models of behavioral signals.
  • Systems: Hardware, operating system, runtime, and language support for fast, scalable, efficient data centers.
  • Information Retrieval: search algorithms, information extraction, question answering, cross-lingual retrieval and multimedia retrieval

If you or any Ph.D. students you know are interested in applying for the program, the deadlines are quite tight to make sure we can support students in the upcoming year. I’m really looking forward to seeing the applications. If you have any questions, please feel free to ask me or email the fellowship list at fellowships AT

Venturing to the tail

It’s now second nature to think that the top 1% of media account for an overwhelming percentage of overall sales. But how many people actually consume content from the more obscure parts of Netflix’s catalog? Sharad and Co. at Yahoo! Research just released the results of some research looking at how users fit into long-tail distributions of content.

Corpus Satisfaction

The results? “85% of Netflix users and 95% of Yahoo! Music users have ventured into the tail (i.e., consumed items not available in large, brick-and-mortar retailers), and 40% of Netflix users and 70% of Yahoo! Music users regularly consume tail items.” The distributions above show how many users in a given system will be satisfied when you only include the top items in a given catalog. People’s web browsing may be more obscure than their music tastes, but in both cases a media provider needs to maintain a significant catalog to afford the tastes of their audience.

For social media practitioners, this is a great indicator of how much content you need to reach a mainstream audience. For music you’re going to need over 60% of the entire music catalog (or at least Yahoo!’s), and for search, well, I wouldn’t go there.

Maintained Relationships on Facebook

This past week the Economist published a piece entitled Primates On Facebook that described some research done by the Facebook Data Team. Since there have been a number of questions throughout the monkeysphere, we thought we would take the opportunity to describe our approach, the data, and our analysis.


We were asked a simple question: is Facebook increasing the size of people’s personal networks? This is a particularly difficult question to answer, so as a first attempt we looked into the types of relationships people do maintain, and the relative size of these groups. The image above presents a high-level overview of our findings: while the average Facebook user communicates with a small subset of their entire friend network, they maintain relationships with a group two times the size of this core. This not only affects each user, but also has systemic effects that may explain why things spread so quickly on Facebook.

Before discussing the data, let us first set the context.

People you know

Many people are asking questions about the number of friends they have on Facebook. Do I have enough? Do I have too many? What may be tripping people up here is the language: while the people you’re connected to on Facebook are called your “friends,” they’re more likely people you have met at some point in your life. Social network researchers have been trying to measure this number for decades, and come across a number of clever techniques.

If you’ve read the Tipping Point, you may remember a study Gladwell described where people were asked to identify whether or not they knew people with names from a long list culled from a phone book. Based on the probability of knowing someone with a given name and the number of people with this name that a person knows, we can estimate the number of people a given subject has met. Killworth, et al. found using this technique and others that the number of people a person will know in their lifetime ranges somewhere between 300 and 3000 ((Killworth, P., Johnsen, E., Russell, H. B., Shelley, G. A., and McCarty, C. Estimating the size of personal networks. Social Networks 12 (1990), 289–312.)).

On Facebook, the average number of friends that a person has is currently 120 ((Facebook Statistics)). Given that Facebook has only been around for 5 years, that not everyone uses it, and that the not every acquaintance has found each other, this number seems reasonable for an average user.

Communication network

As a subset of the people you know, there are some individuals with whom you communicate on an ongoing basis. The number of individuals that represent a person’s core support network has been found to be much, much smaller than their entire network. Peter Marsden found the number of people with whom individuals “can discuss important matters” numbers only 3 for Americans ((Marsden, P. Core discussion networks of americans. American Sociological Review 52, 1 (1987), 122–131.)). In a subsequent survey, researchers found that this number has dropped slightly over the past 10 years ((Mcpherson, Miller, Smith-Lovin, Lynn, Brashears, and Matthew, E. Social isolation in america: Changes in core discussion networks over two decades. American Sociological Review 71, 3 (June 2006), 353–375.)), causing some alarm in the press, but without sufficient explanation ((While this work is well cited, there is support that the methodology underestimates the core network, e.g. Bearman, P., and Parigi, P. Cloning Headless Frogs and Other Important Matters: Conversation Topics and Network Structure. Social Forces 83 (2004), 535.)).

How many people an individual communicates with probably exists somewhere between their total network size and their support network. Some research by Gueorgi Kossinets and Duncan Watts observing all email communication at a university shows that the number of ongoing contacts hovers somewhere between 10 and 20 over a 30 day period ((Kossinets, G., and Watts, D. J. Empirical analysis of an evolving social network. Science 311, 5757 (January 2006), 88–90.)).

Maintained Relationships

Facebook and other social media allow for a type of communication that is somewhat less taxing than direct communication. Technologies like News Feed and RSS readers allow people to consume content from their friends and stay in touch with the content that is being shared. This consumption is still a form of relationship management as it feeds back into other forms of communication in the future. For instance, a high school friend uploads a photo of her new puppy and this photo appears in your News Feed. You click on the photo, browse through a host of other photos and discover that she has also gotten engaged, which may lead you to reach out to her.

This type of communication is the core of the Facebook experience, and given the question posed by the Economist, we wondered what effect this sort of relationship maintenance had on the breadth of people’s networks.

Measuring Networks on Facebook

To try and answer questions about network size on Facebook, we looked at the communications of a random sample of users over the course of 30 days. We defined networks in 4 different ways:

  • All Friends: the largest representation of a person’s network is the set of all people they have verified as friends.
  • Reciprocal Communication: as a measure of a sort of core network, we counted the number of people with whom a person had had reciprocal communications, or an active exchange of information between two parties.
  • One-way Communication: the total set of people with whom a person has communicated.
  • Maintained Relationships: to measure engagement, we took the set of people for whom a user had clicked on a News Feed story or visited their profile more than twice.

For each users we calculated the size of their reciprocal network, one-way network and network of maintained relationships, and plotted this as a function of the number of friends a user has. As Andreas mentions in his blog post about the article, the visualization (shown below) did not make it into the article, but presents a pretty clear picture of the relationship between these types of communication.


In the diagram, the red line shows the number of reciprocal relationships, the green line shows the one-way relationships, and the blue line shows the passive relationships as a function of your network size. This graph shows the same data as the first graph, only combined for both genders. What it shows is that, as a function of the people a Facebook user actively communicate with, you are passively engaging with between 2 and 2.5 times more people in their network. I’m sure many people have had this feeling, but these data make this effect more transparent.

Systemic Effects

What effect does a 2x increase in connectivity mean for a network? The easiest way to observe this is to look at one person’s personal network. The image below shows the personal network for one of my coworkers. The first diagram shows his entire network, namely all of his friends, and all of the relationships between his friends. It is clear that the cluster on the top is the highly connected set of Facebook coworkers, and the cluster on the right is another group of friends.


The cell on the bottom right shows only those relationships that have reciprocal communication. Many of the individuals in his network are completely disconnected or out of touch with each other. Moving to the bottom left cell, we see the slightly more connected network containing one-way communication. This includes every person who wrote a comment, sent a message or wrote a wall post to one of my coworker’s other friends. The cell on the top-right shows the passive network, including all those people who were keeping up with their friends. While some of his friends are still disconnected, a very large percentage are now reachable through some set of observations.

The stark contrast between reciprocal and passive networks shows the effect of technologies such as News Feed. If these people were required to talk on the phone to each other, we might see something like the reciprocal network, where everyone is connected to a small number of individuals. Moving to an environment where everyone is passively engaged with each other, some event, such as a new baby or engagement can propagate very quickly through this highly connected network.

While these data are not a controlled experiment, and do not directly relate to the theories described above, they do show a directional trend in the way people manage relationships on a social network today. We hope to continue this line of research with the eventual hope of making relationships that much easier to manage.

This post represents the work of data scientists Lee Byron, Tom Lento, Cameron Marlow, Itamar Rosenn. Special thanks to Alex Smith for letting us use him as an example. For more insights like this, make sure to become a fan of the Facebook Data Team.

Not the norm

Whenever I am selected as part of a survey panel, online or otherwise, I nearly always take the opportunity. I am “one of those people” who creates self-selection bias. I am a perennial student of surveys, and always interested in what researchers and marketers are trying to understand. I received an invitation this morning by a reputable magazine that I read frequently, and decided to take the dive. One of the many questions asked about online activities, specifically which of the following actions I have partaken in over the past 3 months:

  • Sent and/or received an Instant message (IM)
  • Sent and/or received a text message (SMS) on cell phone
  • Accessed the internet from cell phone or PDA
  • Downloaded/listened to or watched music/videos, podcasts or other audio files, webcasts, etc.
  • Watched user-created videos online (e.g.,
  • Read a blog
  • Posted to your own blog
  • Have a MySpace or similar online profile page
  • Created/uploaded art, photography, images, video, music, etc.
  • Participated in chat rooms or forums
  • Visited social networking sites (e.g.,, etc.)
  • Use RSS Feeds
  • None of the above

Suffice to say, I think that I am not the norm:

Online Activities

Perfect 12!! It’s always refreshing to be reminded that you are not average, especially when the media you consume, the people you interact with and the activities engage in suggest otherwise. Although it seems like many average internet users could fill up a large chunk of this list.

Commuting and social life

I was pleased when Chad directed me to the New Yorker piece on commuting last year which garnered much attention. I myself have spent quite a bit of time on the highways of 101, 280 and 237, not to mention countless trips down the peninsula on the Caltrain. What Chad directed me to, though, was a quote I completely glossed over the first time I read this article, one by Robert Putnam:

“I was shocked to find how robust a predictor of social isolation commuting is… there’s a simple rule of thumb: Every ten minutes of commuting results in ten per cent fewer social connections. Commuting is connected to social isolation, which causes unhappiness.”

According to this calculation, the following chart represents your life as a commuter. On the x-axis is the number of minutes you spend commuting, and on the y-axis the percentage of your potential social life that you retain with the given commute time.

Commuting impact on social life

The impact of this chart is striking, almost to the point of absurdity. With a literal interpretation, someone with a one-hour commute each-way will only retain 30% of their relationships. What’s more, the quote is really derivative of a result he shows in his most popular book, Bowling Alone, where he actually paints the picture much more bleakly, tying commuting to the downfall of civic engagement (surprise!):

We are commuting farther. From 1960 to 1990 the number of workers who commute across county lines more than tripled. Between 1983 and 1995 the average commuting trip grew 37 percent longer in miles. Ironically, travel time increased by only 14 percent, because the speed of the average commute, by all modes of transportation combined, increased by nearly one quarter. Three factors have made for faster travel, at least in recent past–the switch from carpools and mass transit to single-occupancy vehicles, which are quicker for the individual worker though socially inefficient; the increase in suburb-to-suburb commuting; and greater flexibility in work hours. On the other hand, traffic congestion has metastasized everywhere. In a study of sixty-eight urban areas from Los Angeles to Corpus Christi o Cleveland to Providence, annual congestion-related delay per driver rose steadily from sixteen hours in 1982 to forty-five hours in 1997.

In short, we are spending more and more time alone in the car. And on the whole, many of us see this as time for quiet relaxation, especially those of us who came of age in the midst of the driving boom. According to one survey in 1997, 45 percent of all drivers–61 percent of those aged eighteen to twenty-four, though only 36 percent of those aged forty-five and over–agreed that “driving is my time to think and enjoy being alone.”

The car and the commute, however, are demonstrably bad for community life. In round numbers the evidence suggests that each additional ten minutes in daily commuting time cuts involvement in community affairs by 10 percent–fewer public meetings attended, fewer committees chaired, fewer petitions signed, fewer church services, less volunteering, and so on. In fact, although commuting time is not quite as powerful as an influence on civic involvement as education, it i more important than any other demographic factor. And time diary studies suggest that there is a similarly strong negative effect of commuting time on informal social interaction

I would not be the first to question the methodology and results of Bowling Alone, but Putnam’s data is aging quickly, and ignores everything Internet. Perhaps EVDO cards and Blackberrys are helping us stay in touch, but any commuter, technology or not, will identify with the feeling of being out of touch. Putnam’s data are extreme, but at a gut level the intuition seems right: commuting hurts your social life. The more you commute, the less time you have for friends.

Richard Hamming: “You and your research”

In 1986, Richard Hamming gave a talk at the Naval Postgraduate school entitled “You and your research” relating his experience working with some of the best scientists of the last century. It’s a must-read for anyone who does research for a living, and probably applies to just about any line of work. A few of my favorite quotes:

“I believed, in my early days, that you should spend at least as much time in the polish and presentation as you did in the original research. Now at least 50% of the time must go for the presentation. It’s a big, big number.”

“The people who do great work with less ability but who are committed to it, get more done that those who have great skill and dabble in it, who work during the day and go home and do other things and come back and work the next day.”

“Often a scientist becomes angry, and this is no way to handle things. Amusement, yes, anger, no. Anger is misdirected. You should follow and cooperate rather than struggle against the system all the time.”

The peak-end rule

In reading The Paradox of Choice by Barry Schwartz, I came across one of those pieces of research that just keeps coming up in conversation, so I’ll post it here. The theory is known as “peak-end rule,” as expressed by psychologist Daniel Kahneman, describes the way that people remember events by the peak and the end of the experience. For instance, if I go to an amusement park, this heuristic says that I will remember my trip by the height of excitement and the way I felt when I left. The classic experiment showing this phenomenon is described by Mr. Schwartz:

Participants in a laboratory study were asked to listen to a pair of very loud, unpleasant noises played through their headphones. One noise lasted for eight seconds. The other lasted sixteen. The first eight seconds of the second noise were identical to the first noise, whereas the second eight seconds, while still loud and unpleasant, were not as loud. Later, the participants were told that they would have to listen to one of the noises again, but that they could choose which one. Clearly the second noise is worse–the unpleasantness lasted twice as long. Nonetheless, the overwhelming majority of people chose the second to be repeated.

These results are not limited to abstract, constructed experiences. Schwartz another experience with a little more real-world impact:

In the test, one group of patients had a standard colonoscopy. A second group had a standard colonoscopy plus. The “plus” was that after the actual examination was over, the doctor left the instrument in place for a short time… and it made a difference. It turned out that, over a five-year period after the exam, patients in the second group were more likely to comply with calls for follow-up colonoscopies than patients in the first group.

And of course, this example takes advantage of the colonsocopy rule: any research that deals with colonoscopies makes me uncomfortable, and therefore has more impact.

As I mentioned, since I discovered this rule, it keeps popping up in discussions I have been having. Having recently been on a vacation, it strikes me that this heuristic is of utmost importance in planning long events. It appears that the optimal planning for a vacation (or any event for that matter) would look something like this:

Peak-end rule

In the case of my vacation, the last high-point of my time in Europe was in Florence, followed by one brief day in Copenhagen. Not that there’s anything wrong with Denmark, but that day ends up coming up in more of my conversations than the rest of the trip because that is how memory works (that and blood jello is really, really disgusting). If you’re planning any trips soon, make sure to end on a high note, because you will be the one telling the stories.


I like to bunch up all of my stressful events into short periods of time. In the past few weeks I have moved to a new apartment in Hayes Valley, rented my old place to a subletter, walked into the cloistered halls of academia and am currently sitting in my friend Jussi’s apartment in Berlin:

Chez Jussi

I will be traveling down to Dresden on Sunday to attend the International Communication Association (ICA) conference where I’ll be presenting some of the findings from my thesis. The paper is finished, but I’ll post the slides and the paper when I’m trapped in my hotel room in Dresden.

If you’re in Germany or heading to ICA, please look me up, or SMS me at my temporary number, +49 176 6539 8184.