Weblogs and authority

This week I’ll be presenting a paper at the International Communication Association Conference in New Orleans titled Audience, Structure and Authority in the Weblog Community. The paper is an analysis of two different metrics for measuring authority within weblogs:

  • Blogroll: A link from one weblog to the top-level of another, (e.g., links to http://overstated.net, http://www.overstated.net or http://overstated.net/index.asp). I assume this is a proxy to popularity.
  • Permalink: Any link from one weblog to deep content on another (e.g. a link to http://overstated.net/04/05/24-weblogs-and-authority.asp). I assume this is a proxy to influence.

The following table shows the top 20 for each measure. One observation is that many of the top ranked sites are community weblogs (e.g. Slashdot or Memepool). These sites play the important role of hubs, maintaining ties to more weblogs than a single person would be able to. They allow information to diffuse quickly between distant parts of the network of readership.

Blogroll Degree Rank Permalink Degree Rank
links url links url
1. 2581 metafilter.com 1322 boingboing.net
2. 2434 slashdot.org 1270 diveintomark.org
3. 2146 boingboing.net 1096 metafilter.com
4. 1825 kottke.org 1073 slashdot.org
5. 1604 instapundit.com 982 kottke.org
6. 1527 scripting.com 976 weblog.siliconvalley.com/column/dangillmor
7. 1307 evhead.com 956 instapundit.com
8. 1220 andrewsullivan.com 828 andrewsullivan.com
9. 1062 memepool.com 827 themorningnews.org
10. 1007 doc.weblogs.com 826 rathergood.com
11. 977 megnut.com 819 textism.com
12. 961 littlegreenfootballs.com/weblog 683 denbeste.nu
13. 899 diveintomark.org 626 doc.weblogs.com
14. 880 littleyellowdifferent.com 625 asmallvictory.net
15. 848 textism.com 582 rightwingnews.com
16. 846 rebeccablood.net 577 microcontentnews.com
17. 758 plasticbag.org 568 joi.ito.com
18. 737 dashes.com/anil 560 buzzmachine.com
19. 719 ftrain.com 553 waxy.org
20. 714 plastic.com 522 a.wholelottanothing.org

A second observation is that the lists are fairly distinct. While some webloggers hold top positions in both ranks, the list diverges considerably as the position increases. While Blogrolls tend to support the weblog elders (scripting.com, evhead.com, etc.), permalinks suggest a different set of authors as influencers (joi.ito.com, buzzmachine.com, etc.). Looking at the differential between the ranks in the figure below, it is apparent that as soon as the rank passes 100, the correlation between Blogroll and Permalink rank becomes less defined.

rank differential
Permalink and Blogroll rank differential

This raises new light to the age-old weblog power law debate. While the blogroll rankings (reflected by Shirky’s original analysis) suggest a model of preferential attachment, many of those weblogs listed in the top permalink ranks are much younger. If the weblog social structure is mitigated by a law of the “rich getting richer,” we would expect older weblogs to have more influence, and hence more links to their entries.

There are obviously many caveats and details, all of which are listed in the full paper below. Since I’m presenting it this coming Friday, I’d appreciate any feedback you may have.

Full paper: Audience, Structure and Authority in the Weblog Community (pdf 228k)

Popular press and weblogs

In the process of researching a paper for an upcoming conference at the end of the month I did some research on the coverage of weblogs in the popular press. I queried the LexisNexis database for references to "weblog," "web log," and "blog" resulting in 4051 magazine and newspaper articles from 1998 to the present. The first article, published in the Independent, February 18, 1998 isn’t actually a reference to weblogs as we know them, but rather another invention of the term:

Just how tricky the whole thing is is shown by the many drafts through which that note has already gone. Some of these drafts are available on the Internet, and for those of you unfortunate enough to be without a weblog* , I bring you today some of the first versions of that note to Saddam Hussein.

* Weblog. This is a new Internet word I have made up, which I hope will catch on. If it does, I will work out a meaning for it later.

The second reference is an article published in the Guardian, November 11, 1998, citing Jorn Barger’s Robot Wisdom:

Can computers model the human predicament? John Barger’s page sets out to tackle the idea of ‘robot wisdom’, taking in James Joyce, artificial intelligence and Internet issues along the way. The real gem is the weblog, a daily account of John’s travels around the web. Watch a highly observant and thoughtful surfer at work.

The story behind weblogs becomes more complete when they start receiving attention in mid-1999, with weblog exclusives by Jim McClellan of the Guardian (June 3, 1999) and Dan Gillmor (June 14, 1999). Both of these articles followed shortly after a piece by Scott Rosenberg in Salon (May 28, 1999), which unfortunately is not indexed by LexisNexis.

Weblog citations over time

The chart above shows the citation of weblogs over time along with the average number of times the term was used per article in that month. The data have been normalized so that they can be seen on the same plot; the maximum value for occurences of the term occured in October, 1999 at 31, and the maximum number of articles published in April 2004 at 296.

The exponential growth of attention to the topic is striking, although it appears in the last month to have taper off. Comparing this trend with the average number of uses of the term per article, it appears that the more frequently the concept is cited, the fewer times the word is used per article. The obvious interpretation is that the term is slowly becoming part of our vernacular, and when journalists write about weblogs today, much less context is necessary than in 1999. Also, the number of articles exclusively about weblogs is probably on the decline, while stories only tangentially related to weblogs are on the rise.

Another surprising characteristic of the media presentation of weblogs is the oversight of the most popular tools:

Weblog tool # of articles
Blogger 1913
MovableType 919
LiveJournal 181
DiaryLand 114
Xanga 31

While an extremely large contingent of weblog users rely on the last three tools in this list, all of the attention has been on MovableType and Blogger. Given that these tools are private communities, it could be simply that the press is not aware of how explosive their growth is.

If you’re interested in working with the data, I’m offering it up in zip (21 MB) and gzip (19MB) formats. I’ve stripped the HTML of unnecessary cruft, but it could still use being converted to XML.

Spam finger

finger me and DIEI feel like my Bayesian spam filter is winning the arms race against spammers, or at least making the filtering process managable. One of the side effects of having my mail presorted is that I can evaluate which of my email addresses are attracting the most attention. Over the past few months I’ve been watching this statistic very closely, and found that two addresses produce an overwhelming majority of my garbage: mit.edu and uchicago.edu. The irony there is that I never use either address. Where are they harvesting my email from? My best guess is finger.

While companies tend to use more sophisticated directory systems, most universities use finger as an open white pages for students, faculty and administration. In the stone age of the internet, it was ostensibly the only way to find a person’s email address, and it still remains as the most effective means of tracking down a user of an academic network. In most cases, all one needs is a first or last name and the university they work for. On most unix systems, simply typing [email protected] will return a list of entries in the host.edu database matching "name."

This is a veritable gold mine of data for spammers: current students that will be graduating at some point, starting families, and needing loads of xanax, valium and viagra to cope. All the spammer has to do to tap into the finger database is know a first or last name, query the server, and take the email address. Or, alternatively you can just finger all of the names, ranked in descending order of popularity thanks to the 1990 census statistics. Since Cameron is the 336th most common name, it’s no surprise that I’ve been getting a flood of email from my fingerable addresses.

MIT does provide one level of indirection by giving each user an alias, mine being C-marlow. If you turn around and finger C-marlow at mit.edu, MIT responds with all of my contact information. I am in no way a privacy pundit, I just don’t appreciate getting unsolicited email. At this stage in the game, it seems to me that finger must die. Schools that still want to provide a directory service should do it through a web email interface, obscuring the addresses of students and employees. Otherwise they threaten to render their email addresses useless by serving them up wholesale to spammers.

MIT power outage

At about 1pm this afternoon, the entire MIT campus suffered a complete power outage, the first time such an event has occured in my 5-year tenure here. I was in the lab at when the power cut, and I was lucky enough to catch the sound of some 50 computers just outside my office spinning down simultaneously. The power was off for about three hours, at which point we began picking up the pieces of our poor, beaten network.

MIT maintains their own power cogeneration plant which supplies the campus and a good part of Cambridge through an exchange with NStar Electric. Talking with friends in the area it appears that much of the city went offline for some time, but was restored long before MIT powered up. This outage doesn’t seem to appear on the New England power consumption stats, but I assume that is because MIT maintains its own grid. Thankfully, MIT maintains its own statistics on the cogenration site, which I’ve cached below:

mit power consumption, 5-3-04
MIT Power Consumption, 5.3.04

There hasn’t been any news yet as to the cause, but I would expect it to at least make the local news (it seems like a pretty major event, considering that it took a full 3 hours to restore power to the campus). It wasn’t anywhere near the magnitude of the recent blackout in New York, but it did have a similar socializing effect, as people crawled out from under their desks and scurried outside into the jarring sunlight. Frankly I’d be happier if they threw the switch on a regular basis, just to test people’s ability to communicate face-to-face, a sort of fire drill for social interaction.

May 4: The MIT newspaper has covered the story, citing that an outage of this magnitude has happened only once in recent history.

The Globe reports that many Verizon customers lost voicemail when their data center in Cambridge went offline yesterday.