Presidential Debate Analysis

Cameron MarlowOctober 1, 2004January 30, 2020Politics, Projects, Research

Whenever I watch a televised debate, I always wonder what percentage of the speaker’s message is actually thinking on the feet and how much is canned material. With the advent of available transcripts, these sorts of questions can be addressed with various computational methods.

A simple way to identify repeated statements is to count the number of times a particular noun phrase is metioned. Noun phrases act as both a proxy to the subject matter of a given piece of text, but also the way in which things are worded.

For this simple experiment, we’ll need four tools:

The transcript (simplified from the original)
Lingua::EN::Tagger, an English Part-of-speech tagger written in Perl
phrases.pl, a perl script to parse the document and extract the noun phrases
Debate Spotter, an interactive interface to visualize the results

The results are quite interesting. Looking only at noun phrases of at least 2 words occuring at least twice for a given speaker, we arrive at some spectacular catch phrases. For Bush my favorite is “hard work,” which he said repeatedly. Apparently Bush thinks that the world is a difficult place to be. For Kerry, a salient phrase was “war as a last resort.”

The top 25 phrases for Bush and Kerry follow. The number following each phrase is a rank described by the length of the phrase and the number of times it appeared.

There are so many other types of analysis that could be run on these data. If you find anything interesting, please let me know. Also, the Debate Spotter allows for any query, so post any interesting phrases that you find.

Update: I have also analyzed the Vice Presidential and the Second Presidential debates.

Bush

free iraq (14),
hard work (13),
wrong war at the wrong place at the wrong time (13),
wrong war at the wrong time at the wrong place (12),
north korea (10),
kim jong il (10),
my opponent (9),
american people (8),
same intelligence (8),
prime minister allawi (8),
best way (7),
free afghanistan (7),
world a more peaceful place (7),
mixed messages (7),
iraqi citizens (6),
al qaida (6),
weapons of mass destruction (6),
dynamics on the ground (6),
breach on the agreement (6),
end of this year (6),
grave threat (6),
matter of fact (5),
cannot lead (5),
grand diversion (5),
wrong signals (5)

Kerry

saddam hussein (14),
north korea (14),
nuclear weapons (10),
weapons of mass destruction (9),
osama bin (9),
united nations (9),
war as a last resort (9),
american people (8),
90 percent of the casualties (7),
nuclear proliferation (7),
remedies of the united nations (7),
90 percent of the costs (7),
united states of america (7),
homeland security (7),
mountains of tora bora (6),
10 active duty divisions (6),
different set of convictions (6),
four years (6),
president bush (6),
president of south korea (6),
strong alliances (6),
two years (5),
secretary of state (5),
tax cut (5),
bilateral talks (5)

80 thoughts on “Presidential Debate Analysis”

Pingback: Erik Benson's Morale-O-Meter
Pingback: manalang.com
Alexander says:

October 1, 2004 at 3:56 pm

You forgot poland!

Reply
Celly says:

October 1, 2004 at 4:29 pm

You forgot “Smoooooooooooke’n” … woops, wrong Kerry.

Reply
cameron says:

October 1, 2004 at 4:41 pm

Actually, he forgot poland. I didn’t forget anything.

Reply
Anonymous says:

October 1, 2004 at 4:55 pm

You may want to rework your numbers. I searched the transcripts and counted occurrences of some of those phrases. The numbers you give are higher than the number of occurrences in the transcript (for example, Kerry’s mention of Tora Bora occurred only twice).

A very cool idea, though.

Reply
Pingback: ScaleFree.Net
Pingback: Echo's Diary
James says:

October 1, 2004 at 8:41 pm

Cameron, this is brilliant. Could it be used in conjuction with something that scrapes the pair’s campaign sites?

Reply
Anonymous says:

October 1, 2004 at 9:36 pm

The numbers he gives are not the nmber of times the phrase occurred. “The number following each phrase is a rank described by the length of the phrase and the number of times it appeared.”

Reply
cameron says:

October 1, 2004 at 11:10 pm

I needed to rank the phrases somehow, and just looking at the total occurences favors phrases that are short but common. Instead I made up my own ranking algorithm:

score = length (in words) + occurences

So a phrase with 8 words that appears twice will have a score of 10, the same as a phrase with 2 words that appears 5 times. I played around with this score for a while and this method seemed to pull up the most interesting results.

As for applying this elsewhere, the script I’m using to visualize it is extremely simple. All I need is a plaintext file with the content and it’s trivial to set up.

Reply
sr says:

October 2, 2004 at 9:34 am

Amy’s Robot did a similar, though less technically sophisticated, analysis that gives the number of occurrences of each phrase. Here’s the post.

Reply
ek says:

October 2, 2004 at 9:45 am

What this is really great for is making sure, after the fact, that you took enough drinks during your presidential debate buzzword drinking game.

Reply
pdmt says:

October 2, 2004 at 10:21 am

Although not only used as a noun phrase, I think it is equally revealling to find that Bush used the word VOTE 7 times, while Kerry only did 3 times.

Reply
Pingback: e-Literate
mike says:

October 2, 2004 at 11:51 am

This is Brillant! I found it after wondering how many times Bush said “it’s hard work” …my favorite also.

It reminds me of Will Ferrel on SNL playing Bush…He’s in the Oval office and there are flames outside the windows as the whole world goes to hell Will (bush) pops out from under the desk exclaiming “it’s hard work” and then he pops a beer can open! funny stuff if people weren’t dying.

mike

Reply
meg says:

October 2, 2004 at 11:57 am

text analysis is pretty cool. With software like textpac and catpac (and a gazillion others) you can count the occurence of individual words, plus the occurence of words in relation to certain other words. Knowing that Bush said “hard work” 13 times is interesting, but knowing that he said it 13 times next to words describing his own job – as a potential rhetorical strategy to get us to feel bad for the poor little guy, he must be tuckered out! nation-building is rough stuff! – is pretty interesting too.

Reply
Anonymous says:

October 2, 2004 at 11:57 am

I missed your explanation of method before my previous post. But I’m still not sure how much we can learn from this method.

You say “this method seemed to pull up the most interesting results.”

If you chose the algorithm that produces the ‘most interesting results’, then you’re skewing your research. You chose the way that makes the data fit a pre-determined goal, rather than extrapolate from simple evidence. How many words a phrase contains doesn’t have much to do with whether it’s ‘canned material’. The Bush administration has gone far on two- and three-word phrases (“Society of ownership”; “faith-based initiatives”), which are about as simple as an English construction can get. So it doesn’t follow that the longer a phrase, the more likely it is to be campaign boilerplate. It might be more interesting to look at a straight ranking of most common noun phrases rather than filter them in this way.

Reply
Pingback: Chris Boese's Weblog
Marie says:

October 2, 2004 at 12:16 pm

I realize this isn’t your department, but the Oct. 3, 2000 Gore-Bush Debate on that CPD transcripts page is actually linked to the transcript of the Kerry-Bush debate on Sept. 30, 2004. So far, I haven’t found the webmaster’s contact info.

Reply
cameron says:

October 2, 2004 at 12:40 pm

Two points on the validity of these methods:

1. The data, software and methods used to generate these results are freely available. Anyone disagreeing with the ranking can easily download the source and run the software yourself. I’d be more than happy to help with this process.

2. In saying “this method seemed to pull up the most interesting results,” I meant that I was changing the algorithm based on my knowledge of the types of phrases I was looking for and the parameters I could use (the length of the phrase and the number of occurrences). The noun phrase extractor that I used scores the phrases multiplicatively by default (length * occurrences) which places too much emphasis on long noun phrases. I chose to score them additively (length + occurrences) because I felt that repetitiveness is a more important feature, especially for such small amounts of data.

Reply
PC says:

October 2, 2004 at 1:36 pm

This is really interesting. Language Log did a related analysis on both candidates’ sentence length, with Kerry’s (as expected) average sentence length higher, as well as his contribution in words to the debate overall (though Bush had slightly more sentences…but they were shorter).

Reply
Pingback: Mathemagenic
Pingback: politics.relevanta.com
Quotient says:

October 3, 2004 at 4:46 am

Why not use a more information-retrieval-type score:

score = TF*IDF = term freq * inverse document freq

term freq = # occurrences of a phrase in the speeches

inverse document freq = # occurrences of the same phrase in some large body of English text.

This way, you don’t have to worry about the length of phrases, just whether they occur commonly or not in normal text.

Reply
Mossgreen says:

October 3, 2004 at 4:03 pm

I wondered about the amount of “Hard Work” comments were made during Bush’s parts of the debate. Thanks for posting it.

Sue

Reply
Quotient says:

October 3, 2004 at 5:10 pm

Err, rather IDF = 1 / (# occurrences of the same phrase in some large body of English text)

Reply
cameron says:

October 3, 2004 at 5:30 pm

Quotient — Frequency is one of the two parameters I’m using. I would consider the frequency relative to the documents if the definition of document made more sense. I could look at the words in terms of their use within turns of the dialog, e.g. is the word common in one turn of Bush or the entire talk. Unlike TFIDF though I’d be looking for phrases with low IDF, or rather to maximize the DF instead of the IDF, i.e., the more spread out a phrase is, the more important the phrase is for the talk.

The reason I’m including the phrase length in the calculation of the rank is that it’s more interesting to see a long phrase repeated instead rather than a short one. For instance, Bush says “free Iraq” 12 times and “Iraq” 52 times. Both are noun phrases, but the term “free Iraq” has more semantic meaning and importance than the single word Iraq. In describing the phrases, I think longer phrases are more meaningful, and thus more interesting, but I want to balance this feature with the frequency. Does that make sense?

Reply
Anonymous says:

October 3, 2004 at 6:31 pm

don’t forget poor poor poland

Reply
oblivion says:

October 3, 2004 at 7:33 pm

Isn’t it important for any candidate to state their standpoint and DEBATE, rather than to reiterate their campaign, to say what they have been saying over and over and over. By using noun phrases they are just creating “coined terms” that induce an individual to take a side on the basis of how they sound, on what the semantic meaning of these terms are, rather on the “core belief” (another highly used term, by the way)of each candidate.

Reply
Pingback: Taegan Goddard's Political Wire
Kathy says:

October 3, 2004 at 9:19 pm

Cool tool. However, tools can miss semantical repeats. For example, I decided that there were at least 15 instances of “hard work” because two times Bush followed the phrase with “it’s hard.” I wrote a letter-to-the-editor at the Seattle Times (which I should post online).

This also misses the very interesting philosphical approach to “control” that was evidenced in the ad lib exchange about Bush’s daughters (he wants to put them on a “leash” and Kerry advises that it doesn’t work.)

For the congitive scientists: I’m curious about the “wrong war, wrong place, wrong time” phrase … isn’t there a danger of the phrase standing alone (ie, not as an indictment of Kerry?).

(PS, I’m blogging you but I don’t have trackback technology)

Kathy

Reply
Pingback: locussolus
Pingback: officiallyover.bloghorn.com
Glenn says:

October 4, 2004 at 7:16 pm

Are there programs out there like this one we can use for other speeches? Or can you provide yours via the Web to political junkies?

Hours of fun, but on a serious note, such analyses of elections will find their way into serious considerations of the styles and strategies of candidates for office.

Reply
Pingback: Baq's Weblog
Pingback: Dakota
kevin says:

October 5, 2004 at 2:04 pm

Just noticed in my search for “world” that there are a couple of sentences repeated in the fourth Bush paragraph – where he’s talking about talking on the phone with world leaders…

I’m looking for that “I know the world we live in,” or something to that effect statement.

Reply
Pingback: Collin vs. Blog
katie says:

October 5, 2004 at 9:01 pm

Odd that Kerry’s top 25 list includes “Osama bin” rather than “Osama bin Laden” — the former phrase was never spoken in the debate without “Laden” after it.

Reply
cameron says:

October 5, 2004 at 10:30 pm

The “Osama bin” phrase is a byproduct of the noun phrase parser that I use. In 99% of the uses of the work “laden,” it’s a verb, which won’t be part of a noun phrase if it’s at the end. So while it correctly identifies “Osama bin,” it misidentifies his last name. All of these techniques are prone to exceptions, but it still seems to work pretty well.

Reply
marxron says:

October 6, 2004 at 12:43 am

How about we count emotional expressions as buzzwords? After being provoked, a certain candidate gave emotional expression, before taking a moment of recomposure and delivering a substantive answer. The debate in the media was not about capturing where each person stood, and allowing the voters to see which camp backs the way they feel. Rather than focus on the substance, the aftermath focused on a sort of buzzword: emotional facial expressions …

Reply
Ian says:

October 6, 2004 at 11:30 pm

This is mildly interesting work in the perl sense (i use it, but really), it amazes me that so many people are keen on this random kind of analysis. Why don’t you just listen to the bastards. Their meaning will become quite clear, you don’t need perl. If you missed it (god knows why, you have so few chances to actually hear the candidates) then read the transcript in full. You are humans (in the loosest sense) so get involved, read and listen to them and decide. The world is waiting. Welcome to the 18th century.

Reply
cameron says:

October 6, 2004 at 11:39 pm

I don’t understand why you think this is a “random kind of analysis,” given that it’s using techniques employed by computational linguists for decades. The motivation is simple: computers allow us to see patterns that we wouldn’t see otherwise.

There are really two parts to this post, the first being the linguistic analysis above, and the second being the Debate Spotter tool for visualizing the results. The former provides utility because it aggregates phrases at a level that connects with the semantics of the speaker, namely if there’s a phrase or jargon that they’re trying to repeat, it will typically be in the form of a noun phrase. The latter tool allows people to do their own investigative work without spending hours poring over the text. I’d like to know who was talking about a particular topic more, but I don’t really want to count for every word.

Is that not motivation enough to invest an hour of coding time? And personally I don’t think it’s interesting in the Perl sense at all, as “perl -MCPAN -e install Lingua::EN::Tagger” is pretty trivial.

Reply
Tom Johnson says:

October 7, 2004 at 2:50 pm

It would be interesting to connect the word and phrase scored to transcripts from the spin doctors to see how far the politicians get from point…

Reply
Erich Jacoby-Hawkins says:

October 7, 2004 at 5:25 pm

I think your counts would be more interesting if you collected together phrases that had essentially identical meanings, like the “wrong war etc.” of which you have two interchangeable versions, or by including “working hard” (twice) with “hard work” (eleven).

Reply
ChrisA says:

October 8, 2004 at 1:34 am

This is pretty cool. What would be interesting would be to graph the phrases out of the stump speeches over time and see how the campaigns are adjusting the message as time goes by.

Reply
Mark Sample says:

October 8, 2004 at 5:28 pm

Interesting, Kerry mentions “Florida” twice, while Bush doesn’t mention the state by name at all, ever. Pretty remarkable, considering the debate was held in Coral Gables, FL. It’s as if Bush is afraid to jinx the upcoming election by drawing people’s attention to the state, i.e. his trojan horse. Or was Florida a gift horse in 2000? Or simply a stolen horse? At any rate, great tool, Cameron.

Reply
Donna Prepejchal says:

October 8, 2004 at 10:59 pm

John Kerry,

I hope you where a bulletproof vest. I am sure they are out gunning for you. I want to go in the direction that you want to steer the country. May you sail to victory.

Reply
Mike E. says:

October 8, 2004 at 11:13 pm

Did anyone keep score of how many questions each candidate actually answered directly, I mean actually answered the question that was asked? My favorite was the right to life issue which Kerry completely skirted by saying that he didn’t want to bring his personal views into it, WELL HELL SON, That’s how we vote for presidents, on their personal views (at least that’s how we hang ‘em). Well I’ll tell you, as far as directly answering the questions Bush wins hands down, so much so that we might even overlook this oil baron’s bullshit answer to about spending a billion dollars on hydrogen cell research. As you can tell I have no love for either candidate, but as far as this debate goes, Bush’s advisors kicked the hell out of Kerry’s advisors hands down. Advisors took a huge part in this debate evidenced by how intelligent bush sounded, as well as his complete lack of the use of clichés and the usual catch phrases. P.S. I would love to see proof of the billion dollars spent on hydrogen cell research if anyone has it handy.

Reply
Pingback: Sample Reality
joe says:

October 9, 2004 at 12:27 am

only a pussy would vote for kerry

Reply
Jared says:

October 9, 2004 at 11:25 am

Click on “Wrong war at the wrong place at the wrong time” under Bush and read that through.

Reply
Anonymous says:

October 9, 2004 at 12:55 pm

“A simple way to identify repeated statements is to count the number of times a particular noun phrase is metioned.”

Just a note that you mispelled “metioned” … it should be “mentioned” . Thought I would point this out.

Reply
Anonymous says:

October 9, 2004 at 4:44 pm

Another phrase of interest would be “no child left behind”

Reply
Anonymous says:

October 9, 2004 at 10:33 pm

looks like one thing that the automatic thing didn’t catch was phrases that were plural versus singular. it already counted a decent score for kerry’s use of “90 percent of the casualties”, but the score could have been quite higher if it had caught “90 percent of the casualties and 90 percent of the cost(s)”. there are two phrases with that. and still another in which the words “in Iraq” are interposed.

it seems the computational method could perhaps take this type of thing into account for slightly longer phrases to really highlight those items that a debate participant is trying to emphasize.

Reply
Scott says:

October 9, 2004 at 11:46 pm

tax cut – Kerry 17, Bush 4

Reply
Stuart Dambrot says:

October 10, 2004 at 9:29 pm

Nice work, even with all the caveats.

I’m attempting to apply predicate calculus to the intersection of political speak with related fact. My goal is to drive a truth table that can devolve into a simple measure – perhaps a percentage – giving the overall truth value of a given statement, position paper or speech.

My first thought was to use an autoconstructing database (e.g., askSam) to create entries, but that seems a bit clunky. An algorithmic approach would be much better, since ideally it could be run in real-time.

Any thoughts?

Reply
bryan says:

October 12, 2004 at 8:59 pm

so where can i sign up to vote for arnold?

Reply
Anonymous says:

October 13, 2004 at 11:48 am

is John Kerry republican

Reply
Dauthi says:

October 13, 2004 at 11:28 pm

Yes, yes, of course he is. XP

Reply
Werner Soll says:

October 14, 2004 at 1:49 am

Why will president Bush all of a sudden accept flue shots made in Canada? In the last debate he argued that drugs from Canada (made in the USA)were unsafe for Americans. Has he suddenly discovered that Canada is a pretty safe neighbor?

Reply
Faisal Attar says:

October 14, 2004 at 2:15 am

I’m in Dubai, UAE. I have special interest in working with texts.
Could you tutor me on this subject. I need to learn how to do the analysis of texts and what tools to use.

Reply
Jozef Imrich says:

October 14, 2004 at 6:07 am

With the greatest respect, I can think of after absorbing the data was the good old (pre-divorce) Czechoslovak saying:

He who thinks by the inch and talks by the yard deserves to be kicked by the foot.

Reply
Leslie says:

October 14, 2004 at 4:50 pm

I really hate the fact that both canidates say the same thing over and over. I was going towards Kerry but all he’s focused on is Iraq. The main thing. Bush spent 42% of his time on vacation during the 6 months that he was in office before 9/11. Was he ready? No way. They are both idiots. But Kerry might be able to save the economy.
Leslie

Reply
Anonymous says:

October 18, 2004 at 7:10 pm

Go Bush

Reply
Anonymous says:

October 20, 2004 at 6:23 pm

Everyone should vote for john kerry he makes so much more sense then bush. Bush just blabs away about anything that comes to mind. he has the IQ of a rock

Reply
ass says:

October 21, 2004 at 10:05 pm

hi

Reply
Anonymous says:

October 21, 2004 at 10:06 pm

fuck

Reply
martin says:

October 21, 2004 at 10:10 pm

GO BUSH!!! KERRY SUCKS JUST LIKE THE YANKEES!!!

Reply
Adela Carrasco says:

October 22, 2004 at 1:09 pm

People should get a little more educated about voting, instead going out there and just do it they need to know the real issues and I think that people who is lazy and likes free stuff should vote for Kerry, because that’s all he is about.

Reply
BRANDY says:

November 2, 2004 at 1:10 pm

…not in our nation’s best interest

Reply
Sebastian says:

September 24, 2005 at 7:21 pm

hi.
Do you have any programs or ideas that you might want to try out with music? I’ve allways wanted to mix the two.
/ Sebastian, Sweden.

Reply
Funny Videos says:

October 14, 2005 at 12:25 am

Presidential Debate or Presidential Parody? Ahh. You lose either way.

Reply
Pingback: Sarah Schmidt
Pingback: Krittina Kiato
Pingback: Krittina Kiato
Pingback: Krittina Kiato
Unlock iPhone 3Gs 4.0 says:

March 4, 2011 at 6:24 am

As a Newbie, I am constantly exploring online for articles that can be of assistance to me. Thank you. http://www.kashzahovbah.com/electronics/video-marketing-best-methods-tips/

Reply
lancaster pa personal injury lawyer says:

November 10, 2011 at 4:19 pm

I enjoy reading a post that will make people think. Also, thanks for allowing me to comment!

Reply