Superman, RIP

Breaking news.. Christopher Reeve has just passed away. So sad.

Superman, RIP
Superman, RIP.

The strange thing about this news is that I immediately pasted that link to my blog and everyone in my buddy list still awake at the time without really considering why. It’s a flashbulb event, the kind of thing that, for whatever reason, emblazens itself on your brain to the effect of “I know where I was when so-and-so happened.” The amazing thing is, I knew immediately when I saw the BBC story that I needed to flood it.

Presidential Debate Redux

bush and kerryI’ve rerun my presidential debate analysis (see analyses from the first presidential debate and the vice presidential debate) on the scripts of the second presidential debate. I’ve also updated the Debate Spotter to include the new text. But this time I’ve taken a slightly different approach to the analysis. Instead of some complicated weighting scheme, I’ve decided to use a very simple technique to sort the phrases for each candidate:

  • Count the number of phrases for each candidate
  • Score each phrase as the difference between the number of times each candidate used the phrase
  • Favor longer phrases in sorting

The results follow, and I think you’ll find them much more revealing than the previous lists. I also fed both candidate’s transcripts into Microsoft Word’s AutoSummarize feature to produce a sub-100 word summary. The results are… umm… compelling. From my perspective, it seems as though Kerry is on the offensive, and Bush is backpeddling. But of course that’s just Microsoft’s take on the debate. Click on the following links to download the source Word documents. I’ll leave running the grammar checker as an exercise to the reader.

kerry041008.doc bush041008.doc
Continue reading “Presidential Debate Redux”

Migrating to del.icio.us

social bookmarksI’ve been keeping a list of low threshold, or daily links here for over a year. My links were largely inspired by Joshua Schachter’s link list (Muxway) which he later turned into the popular del.icio.us social bookmarks engine. I remember trying the system out not long after I started my linkstream, but for some reason the idea didn’t really grab me, and the interface didn’t provide me much more over Movabletype.

Somewhere along the line though, del.icio.us became a much more streamlined means of posting and categorizing links. Moreso it became an independent subculture that really embodies a zero threshold mentality of linking. The system really benefits from people having almost no impetus to adding URLs, and it accomplishes this (quite wisely) by taking away any sort of identity or soapbox that could be used to influence other users of the system (more on that in a post coming up later today). Anyway, what I’m trying to say is that I wanted to use del.icio.us. So I stopped posting my oddments and I started posting to my del.icio.us account.

Since then I’ve transferred my old links from Movabletype into del.icio.us and integrated my del.icio.us links into my weblog using some very hand Perl tools. More importantly I’ve kept all of the functionality of my old link list, including the ability to give credit to my source through a via link. Here’s how:
Continue reading “Migrating to del.icio.us”

Vice Presidential Debate Analysis

Akin to my last entry, I’ve run the transcript of the Vice Presidential Debate through a part of speech tagger and identified the most popular noun phrases for each speaker (listed below). I’ve also updated the Debate Spotter to handle both scripts. Simply change the debate field and the transcript and speakers will be changed accordingly.

Have fun, and of course let us know if you identify any interesting phrases.

Continue reading “Vice Presidential Debate Analysis”

Presidential Debate Analysis

Whenever I watch a televised debate, I always wonder what percentage of the speaker’s message is actually thinking on the feet and how much is canned material. With the advent of available transcripts, these sorts of questions can be addressed with various computational methods.

A simple way to identify repeated statements is to count the number of times a particular noun phrase is metioned. Noun phrases act as both a proxy to the subject matter of a given piece of text, but also the way in which things are worded.

For this simple experiment, we’ll need four tools:

The results are quite interesting. Looking only at noun phrases of at least 2 words occuring at least twice for a given speaker, we arrive at some spectacular catch phrases. For Bush my favorite is “hard work,” which he said repeatedly. Apparently Bush thinks that the world is a difficult place to be. For Kerry, a salient phrase was “war as a last resort.”

The top 25 phrases for Bush and Kerry follow. The number following each phrase is a rank described by the length of the phrase and the number of times it appeared.

There are so many other types of analysis that could be run on these data. If you find anything interesting, please let me know. Also, the Debate Spotter allows for any query, so post any interesting phrases that you find.

Update: I have also analyzed the Vice Presidential and the Second Presidential debates.

Continue reading “Presidential Debate Analysis”

Foocamp Hacks

I returned to Sebastopol this year for the second Foocamp, a nerd-laden affair taking place on the rural campus of O’Reilly Publishing. The event is a sort of self-organizing conference of talks on varying topics of technology and geekdom, all of which take place in an unused portion of the O’Reilly offices. In order to navigate an otherwise empty building, the organizers assign different areas with names of animals, in the true O’Reilly tradition. With the only significant defining feature of every room being a sign with the names ant, appaloosa, armadillo, camel, hermit crab, jaguar, koala, opossum, owl, reindeer, tree frog and wallcreeper, it can be pretty difficult to get your bearings.

Tim O'Reilly in the Appaloosa Room
Tim O’Reilly speaks in the Badger Appaloosa Room

What happens when you introduce a subversive element into this environment, equipped with some serious ammunition and a few too many beers?

BADGER PANDEMONIUM

Campers woke up bleary eyed for talks on Sunday unable to locate their precious knowledge. Hackers watched and rubbed their hands together and sang to themselves, “badger.. badger.. badger.. badger..” No one got hurt, except one poor badger who was ripped to death and a mushroom who was abducted. And the campers got their knowledge.

Stata Flood

There’s a funny joke about buildings designed by Frank Gehry. It goes something like this: first you pay millions of dollars to erect the thing, then you spend millions over many years to stop it from leaking. Having friends in the Ray and Maria Stata Center here at MIT, I’ve heard of a fair number of drips. So the water on the outside will eventually be kept outside. Yay for caulk. What happens to a Gehry building when it rains on the inside?

stata flood
The Stata gets wet

Last night the Stata Center witnessed a serious test of engineering when a fire alarm triggered the sprinkler system on the 4th floor. The water sprayed long enough that it began to collect and eventually find its way down to the ground floor. I was called in to document the situation:

Stata Flood Photos

The damage seemed pretty extensive, but I have no conception of what it takes to repair slabs of sheet rock that meet at a 124° angle constrained to a circle that intersecting a glass roof. Something tells me the materials are cheap but the rocket scientists who install it are expensive.

Smiths reunion!?

the smithsWhen Johnny Marr left the The Smiths in 1987 over creative issues with Morrisey, rumor has it that the pair were so angry they’d be lucky to set foot in the same room again. The Smiths were over, and fans would just have to accept that. Well, strangeways here we come. This week’s Popbitch email hinted at a possible change in plans.

Duran Duran have done it. The Pogues, Lloyd Cole and the Commotions, Doobie Brothers and Happy Mondays are all reforming too.

But the big news is that one 80s band no-one ever thought would be getting back together are in “early discussions.”

The Smiths.

One fan had even promised his entire estate if Morrisey and Marr so much as shared an awkward silence for 61 minutes, but it appears that history has resolved things without fan intervention. If the Pixies can get back together and talk about another album, anything is possible. But until anything is formally announced, I’m going to assume that now is actually not very soon to come.

Comment spam arms race

While a lot of people are quick to institute draconian rule over their weblogs and email clients, installing any widget that zaps spam before the spammer has even conceived of it, I tend to take a much more latent approach. My Bayesian email filter and MTBlacklist give me the control I need to make sure my world isn’t taken over by garbage, but at the same time I can pay attention to the tactics and technologies that these infidels are employing. It makes me feel empowered.

While cleaning up a few spam comments today, I noticed the next effort in the spamming arms race: encampment. The purpose of comment spam, as we all know, is to harvest PageRank from weblogs that have it and aren’t paying attention. The problem with this strategy is that there is more than one contending force attempting to take over this blog ghetto. The more links that appear on a given post, the less each individual link is worth in Google’s currency.

Instead of spreading links across thousands of pages, the new technique I’ve become aware of is to take a single weblog post, obviously deserted, and use comment spam on other sites to give support. Here’s the link I received today:

Hello, I just wanted to say you have a very informative site which really made me think, thanks very much! Have a nice Day!!

best online casinos

Except for the text (online casinos), this link looks pretty innocuous. And clicking through to the site appears to be no big as well, since it’s just some other weblog. But looking at the comments on this post shows the true purpose, pushing PageRank to any number of other sites. This is a serious ghetto, kind of like the Robert Taylor homes of blog posts, with hundreds of links to other sites.

It seems to me that the most effective strategy would be finding a little corner of the web where no other spammer has found, and placing a few links to your sites there, and using this strategy to elevate the given PageRank. But that’s just from my understanding of the algorithm, and maybe these spammers have something up their sleeve that I don’t know about.

Time Travel Vacation

After re-rewatching 24 Hour Party People I’ve become obsessed with the idea of being in a place at a particular point in time—the moment that a place, a culture and a people come into sharp focus. So just for exercise, let’s assume that in the next 20 years time travel becomes freely available and cost-effective to the point that we are forced to decide between vacationing to a place now or then. What time and place would you choose to spend your two weeks per year on?

I find that most of my friends choose a moment in history when a particular culture or subculture is on the brink of being recognized. For me it’s Manchester in either the late 70’s or the late 80’s, and for other’s it’s Soho in the 60’s or Athens at the height of the Greek empire. The thing that strikes me about these time-places is that we all seem to be excited by the prospect of visiting a moment that defines us, but we were unable to experience.

The irony of this experiment is that if we really did travel to these historic venues, I’m sure we’d be much more excited about being there than the people involved. Of course they don’t know how important their moment is, and how could they, they’re living in it. I’m sure there’s thousands of moments right now that people of the future would be willing to pay a year’s salary to be a part of, but we won’t know for years exactly what we should be jealous of.

I guess what I’m trying to say is that I wish time travel was ubiquitous and cost-effective so that I can visit all of my favorite places in time. Or maybe all of the most influential pieces of history were actually footnotes in the textbooks until future time travellers went back and made them popular. Damn the time travellers.