Survey 2, Electric Boogaloo

some graphFor some reason I expected my survey to spread much further and wider than it actually did. At the current moment, the individuals who were emailed about taking the survey (the random sample) outnumber webloggers at large 3 to 1. I really expected things to go in the other direction.

To rectify this situation, I thought I would provide a little incentive. For those individuals who complete the survey, you can see how you compare to the rest of the survey respondents. You can get a taste of the results here:

http://blogsurvey.media.mit.edu/results

Of course anyone who has already taken the survey can see their results as well, just log in with your login key. Don’t worry if you threw it away, you can request it again.

The survey is up for another week, until Monday June 27, so if you get a chance… I’d appreciate it.


Cognitive dissonance

someone's brain ripped openI haven’t been dumped in quite a while. Usually my relationships just fade away until a decision is made. Besides putting Gloria Gaynor and malt liquor into higher rotation, I’ve been doing a little bit of introspection about the topic.

Getting dumped is a classic case of cognitive dissonance, a theory first proposed by Leon Festinger in the 50s. He observed that people make decisions and actions to minimize the amount of contradictory beliefs they have in their head. When a person is forced to believe two things that don’t match up, they experience extreme emotional discomfort until they can fix their belief system.

So basically I have this thought in my head that’s tied to all kinds of memories and beliefs: she is my girlfriend. Then I introduce this new idea, she is not my girlfriend and the sum of these two obviously contradictory beliefs turns me into a raving lunatic. The more embedded the first belief is, the harder it is to accept the latter, and the longer you pour Old English on your corn flakes instead of milk. F. Scott Fitzgerald put it nicely:

The test of a first-rate intelligence is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function. One should, for example, be able to see that things are hopeless and yet be determined to make them otherwise.

Obviously I’m not operating at first-rate levels currently. But writing dry, bland weblog posts about something that is obviously extremely emotional certainly helps to bring it back.


Hedonic treadmill

the hedonic hamster wheelI’m just about to return a book to the library, something I read a while back and have been meaning to post about for centuries. In their article “Hedonic Relativism and planning the good society**,” Philip Brickman and Donald Campbell give a name to the ongoing state of happiness that we all experience. Despite the fact that external forces are constantly changing our life goals, happiness for most people is a relatively constant state. Regardless of how good things get, we’ll always be about the same level of happy; this they call the hedonic treadmill.

Psychology researchers have observed this phenomenon in a myriad of different situations: lottery winners, tenure achievers, recently handicapped, etc. In all of these situations, despite a massive shift in standard of living or achievement of major life goals, after a short period of time the life-satisfaction levels return to normal.

If this is what we can expect from our own psychology, how does hedonic relevatism affect the way we choose to live our lives? Brickman and Campbell look at this question from a societal level, and suggest that there is an optimal setup for making every member of our culture as happy as possible. You have to give them credit, it was the 70’s and socialism was still a form of utopia. But as far as I can tell, the only way to keep yourself on an increasing scale of happiness is to achieve some small goals on a daily basis, not putting too much emphasis on achieving one over another.

So why am I writing this damned Ph.D.?!

** Brickman, Philip, & Campbell, Donald. (1977). �Hedonic relativism and planning the good society.� In M.H. Appley (Ed.), Social comparison processes: Theoretical and empirical perspectives. New York: Wiley/Halsted.


Academic conference spam

About two years ago I started getting peculiar messages from unknown academics about conferences I’d never heard of. They all follow a standard form, with a subject like “inviting you to participate in BLAH-05.” Some address me as “potential speaker,” some “Dr. Cameron A. Marlow,” and some simple “Dr. Marlow.” This isn’t all that surprising, given that lots of legitimate emails I get from academic institutions refer to me as a Dr. (it’s much more offensive not to refer to a Ph.D. as Dr. than it is to inflate the ego of a mere student).

an increase in conference spam
An increase in conference spam

The surprising thing about these emails is that they’ve been increasing in frequency pretty regularly. They have moved from the space of “oversized conference list” to legitimate spam. In some cases I’ve gotten emailed multiple times about the same conference, and for a subject that’s about as close to my research as I am to finishing my course in Scientology.

So who are these people? Given the regular structure of the emails, I assume that they’re being sent out from one master list. Some arrive from iiisci.org, which appears to be a collection of losely-related conferences, and others from vreme.yubc.net, an ISP in Serbia.

How big is this network? Did I get randomly added to some master list, or are they spidering for academic’s email addresses? Has anyone actually gone to one of their conferences? As with most spam, lots of questions, few answers.


Presidential Debate Redux

bush and kerryI’ve rerun my presidential debate analysis (see analyses from the first presidential debate and the vice presidential debate) on the scripts of the second presidential debate. I’ve also updated the Debate Spotter to include the new text. But this time I’ve taken a slightly different approach to the analysis. Instead of some complicated weighting scheme, I’ve decided to use a very simple technique to sort the phrases for each candidate:

  • Count the number of phrases for each candidate
  • Score each phrase as the difference between the number of times each candidate used the phrase
  • Favor longer phrases in sorting

The results follow, and I think you’ll find them much more revealing than the previous lists. I also fed both candidate’s transcripts into Microsoft Word’s AutoSummarize feature to produce a sub-100 word summary. The results are… umm… compelling. From my perspective, it seems as though Kerry is on the offensive, and Bush is backpeddling. But of course that’s just Microsoft’s take on the debate. Click on the following links to download the source Word documents. I’ll leave running the grammar checker as an exercise to the reader.

kerry041008.doc bush041008.doc
Read More »


Vice Presidential Debate Analysis

Akin to my last entry, I’ve run the transcript of the Vice Presidential Debate through a part of speech tagger and identified the most popular noun phrases for each speaker (listed below). I’ve also updated the Debate Spotter to handle both scripts. Simply change the debate field and the transcript and speakers will be changed accordingly.

Have fun, and of course let us know if you identify any interesting phrases.

Read More »


Presidential Debate Analysis

Whenever I watch a televised debate, I always wonder what percentage of the speaker’s message is actually thinking on the feet and how much is canned material. With the advent of available transcripts, these sorts of questions can be addressed with various computational methods.

A simple way to identify repeated statements is to count the number of times a particular noun phrase is metioned. Noun phrases act as both a proxy to the subject matter of a given piece of text, but also the way in which things are worded.

For this simple experiment, we’ll need four tools:

The results are quite interesting. Looking only at noun phrases of at least 2 words occuring at least twice for a given speaker, we arrive at some spectacular catch phrases. For Bush my favorite is “hard work,” which he said repeatedly. Apparently Bush thinks that the world is a difficult place to be. For Kerry, a salient phrase was “war as a last resort.”

The top 25 phrases for Bush and Kerry follow. The number following each phrase is a rank described by the length of the phrase and the number of times it appeared.

There are so many other types of analysis that could be run on these data. If you find anything interesting, please let me know. Also, the Debate Spotter allows for any query, so post any interesting phrases that you find.

Update: I have also analyzed the Vice Presidential and the Second Presidential debates.
Read More »


Popular press and weblogs

In the process of researching a paper for an upcoming conference at the end of the month I did some research on the coverage of weblogs in the popular press. I queried the LexisNexis database for references to "weblog," "web log," and "blog" resulting in 4051 magazine and newspaper articles from 1998 to the present. The first article, published in the Independent, February 18, 1998 isn’t actually a reference to weblogs as we know them, but rather another invention of the term:

Just how tricky the whole thing is is shown by the many drafts through which that note has already gone. Some of these drafts are available on the Internet, and for those of you unfortunate enough to be without a weblog* , I bring you today some of the first versions of that note to Saddam Hussein.

* Weblog. This is a new Internet word I have made up, which I hope will catch on. If it does, I will work out a meaning for it later.

The second reference is an article published in the Guardian, November 11, 1998, citing Jorn Barger’s Robot Wisdom:

Can computers model the human predicament? John Barger’s page sets out to tackle the idea of ‘robot wisdom’, taking in James Joyce, artificial intelligence and Internet issues along the way. The real gem is the weblog, a daily account of John’s travels around the web. Watch a highly observant and thoughtful surfer at work.

The story behind weblogs becomes more complete when they start receiving attention in mid-1999, with weblog exclusives by Jim McClellan of the Guardian (June 3, 1999) and Dan Gillmor (June 14, 1999). Both of these articles followed shortly after a piece by Scott Rosenberg in Salon (May 28, 1999), which unfortunately is not indexed by LexisNexis.

Weblog citations over time

The chart above shows the citation of weblogs over time along with the average number of times the term was used per article in that month. The data have been normalized so that they can be seen on the same plot; the maximum value for occurences of the term occured in October, 1999 at 31, and the maximum number of articles published in April 2004 at 296.

The exponential growth of attention to the topic is striking, although it appears in the last month to have taper off. Comparing this trend with the average number of uses of the term per article, it appears that the more frequently the concept is cited, the fewer times the word is used per article. The obvious interpretation is that the term is slowly becoming part of our vernacular, and when journalists write about weblogs today, much less context is necessary than in 1999. Also, the number of articles exclusively about weblogs is probably on the decline, while stories only tangentially related to weblogs are on the rise.

Another surprising characteristic of the media presentation of weblogs is the oversight of the most popular tools:

Weblog tool # of articles
Blogger 1913
MovableType 919
LiveJournal 181
DiaryLand 114
Xanga 31

While an extremely large contingent of weblog users rely on the last three tools in this list, all of the attention has been on MovableType and Blogger. Given that these tools are private communities, it could be simply that the press is not aware of how explosive their growth is.

If you’re interested in working with the data, I’m offering it up in zip (21 MB) and gzip (19MB) formats. I’ve stripped the HTML of unnecessary cruft, but it could still use being converted to XML.


Weblogs and churn rate

the countThe first question that every journalist asks about weblogs—how many are there— has been a source of constant debate over the past year. I was cited in the Economist with the number 500,000, which prompted a response, as well as a number of new efforts for estimating this number:

Blogcensus is a funded project crawling and classifying content as weblog or not weblog. Pages identified as weblogs are then categorized by their native language using simple heuristics. This project is the sole work of Maciej Ceglowski

Blogcount is a self-proclaimed aggregator of other data sources. The site is making press releases based on the management reports of centrally hosted weblogs/journals (i.e. Blogger, LiveJournal, etc.). Using the data collected by Blogcensus, the original numbers are adjusted to account for international and non-hosted entities.

A word to the wise: online communities can appear much more active than they actually are (and I’ve got some data to show it!).

Read More »


The Significance of Diaries

While reading for my generals, I have been taking note of all weblog/diary related material with the intention of posting it eventually. Today I ran across an engaging quote in Yi-Fu Tuan’s paper Significance of Artifact:

Diaries retain a measure of the past in the present. First as a physical object we can see that the binding is fragile and that the pages are yellow with age. Then there is the testimony of penmanship—the way that it has changed over the years. Most important of all, obviously, are the feelings, moods, and incidents as they are captured in the entries. But how etiolated they now seem. The keeping of a diary may indeed reassure an individual that he has lived. On the other hand, the skeletal notes and the blank pages are reminders of how little of time can be salvaged by such a literary device. On April 7, 1824, Eugène Delacroix, after reading through what he had written earlier in his diary, added the following comments:

I feel that I still retain control of the days about which I have made entries, even when they are past. But as for the days which are not mentioned in the diary, it is as if they had never existed. What dark abyss has swallowed them up? Are these flimsy pages the only token I have of my past existence? And so my mind and the life history of my soul are to be destroyed because I am not willing to commit to paper that part of them which might thereby be preserved”

I wish I had more time to unpack this, but I’m a bit under the gun (oral exam in t-minus 2 weeks), but I wanted to post it before I forgot and it was lost like the rest of my existence during this busy time.