overstated

Newspapers and search advertising

When searching, I am always interested to see who is paying for the sponsored ads for my query. A while back I searched for some information on the Cory Lidle plane crash and was completely surprised to see iVillage and the New York Times paying for my attention:

Cory Lidel search ads

My initial assumption was that most people today use search to obtain information, regardless of the type. In the case of news, or other recent communications, Google or Yahoo will not be ranking recent stories within the first day. For late breaking news, a large newspaper can effectively solve this information gap by paying a few cents per click. After talking about this with a few people, I came up with a number of different reasons newspapers could be turning to search advertising:

Search gap: People tend to use search for most of their information, and a few cents can grab a lot of attention when you are a news source people recognize.
Higher monetization: Ads on Google and Yahoo! clock in at lower values than one page view on the news site.
Reader acquisition: In the world of online news, it is tough to differentiate, so paying for readers could pay off when acquired readers convert to regulars.
SEO: Someone on-staff has a budget to use on attracting traffic, and search advertising seems like a good use of funds. A few clicks turn into a few links, and there you go.

What happens when our news outlets start paying for readers? This may be an example of the right hand not talking to the left, but the fact that the New York Times, of all newspapers, is the first I saw using search marketing makes me think a little differently about the master of mass media. “All the news that’s fit to print” is now a few degrees closer to “Viagra, Levitra and Cialis” in my head, but maybe this is just a temporary phenomenon. It feels like a major shift in the way news is disseminated, but I might be jumping to conclusions.

Watts music closes

It is a sad day for DJs and electronic music producers. The website of Watts Music, America’s largest distributor of dance vinyl, has announced it is officially closed for business. Most people have never heard of Watts, even if they are a DJ, but they have been directly responsible for moving tons and tons of vinyl every year from Europe to America and vice-versa.

Just to give you perspective, I run a small label with some of my friends, and when we release a record, we look to distributors to buy up some of our stock and move it to stores overseas and domestically. In a couple of cases Watts has been there for us, and probably for thousands of other little labels. Without them, we have even fewer options: Forced Exposure and Syntax most likely. Under the pressure of the closure of Watts, competition for these smaller distributors will get even more intense, and labels like ours will have no option but to turn to fully digital distribution. This means that our days of making records is over, unless we’re prepared to pay for the production, marketing and shipping costs of every copy.

In the next few months the breadth and depth of vinyl at your local record store will start to dwindle. Labels that were being distributed by Watts will have to seek other means, and in some cases they may be forced to stop shipping internationally. Within a few months I would guess that their effect will be fully visible, where DJs find it hard to get their favorite labels without ordering on the internet. It’s hard to say how this will impact the electronic music scene, but it is bound to have a large and immediate effect.

For such a big distributor to close is a powerful omen: vinyl is dead. Well, in the US anyway. Rest in peace.

Explanatory algorithms

There is a trend in recommender systems that I think is extremely interesting: systems are starting to explain themselves. The first place I noticed this was at Amazon in their personal recommendations section, at the bottom of a given suggestion:

In this case, Amazon recommended Moon Palace because I had rated another book by Paul Auster. This makes perfect sense, namely I rated something by an author, so the system recommended other books by the same author. The second place this popped up was at the new social music service iLike. Every time a user views another user’s profile, the system calculates a compatibility score based on how similar your favorite artists are, as shown here:

In this case, I share interest in the bands ESG, TV on the radio, et al. with this user, so our compatibility is high. When I share more popular artists like Miles Davis or Bob Dylan, my compatibility score is lower. This makes sense since rarer bands suggest a closer connection. Last.fm has added a similar feature called Taste-o-meter.

What’s interesting about these examples is not the algorithm, some augmented form of collaborative filtering, but rather in the way that the algorithm explains itself to the user. Many years ago, with the likes of Firefly and CDNow showing off the power of recommender systems, this sort of behavior would have been considered crazy. Showing to users elements of how your algorithm works? What if they reverse engineer it and copy your methods and copy your system and steal all your users?!

Not likely. For most intents and purposes, recommender systems are within wiggling distance of each other. Netflix is holding a contest to see if theirs can be improved, offering a cool $1M to anyone who can show a 10% gain over their current algorithm. While the current leaderboard shows the best contenders at a 4% gain over the original algorithm, Netflix does not expect people to make the 10% gain necessary anytime soon, suggesting the contest could run until 2011. But companies like Amazon and iLike are making improvements through the way that these algorithms are explained.

Explanation creates understanding, and understanding leads to trust.
What if all systems started to take this approach? We mostly assume that search providers keep their ranking algorithms in a 6-foot safe behind a wall of lasers, but at the same time Google is starting to release more information about PageRank through various systems. Someday we might have search results that explain themselves, while keeping the special sauce away from SEO geeks and spammers. Imagine if a top search result said “This result is first because: your search term was in the title, the author is a well known writer, and the host is a reputable newspaper.” I would probably say “that makes sense,” and in turn I would trust that system even more.

Flickr spam email

I received a strange email this morning, addressed to my blogdex email address which has nothing to do with Flickr, but exceptionally high SpamRank:

From: Dee (Barry@nishikoi.com)
To: blogdex@media.mit.edu
Subject: question about your photo

I’ve accidently found your photo at a flickr and i’m very
interested in it.

Can you tell me what place i can see in the background of
it?
—
wbr, Danny

Where “your photo” is a link to http://www.fri91.net/flickr,html. At the outset this appears to be a Flickr phishing scam; while on the train without a connection I was convinced I’d find a Flickr login screen when I followed the link to “my photo.” And you know that when your service is getting phishing scams, you have arrived.

The truth is much stranger. Go ahead, click the link. It’s not going to hurt you. In a sort of janky way, Barry has copied some of Flickr’s code and design along with some of his own “edits.” The page is hosted on a Norwegian soccer club’s website. The links on the page lead to tjhallett1’s Flickr data. The email domain is a fish food company. This piece of spam is a stumper.

The full email is here.

Update: Andy explained to me that this is, indeed, a scam. DO NOT visit the link in IE, it is some sort of Activex control hack. More details here and a virus definition describes the functionality on AusCERT.

It appears that this email is using the credibility of a site like Flickr and its community to get people’s attention and clicks. It’s no different than preying on people with the possibility of Anna Kournikova pictures.

The peak-end rule

In reading The Paradox of Choice by Barry Schwartz, I came across one of those pieces of research that just keeps coming up in conversation, so I’ll post it here. The theory is known as “peak-end rule,” as expressed by psychologist Daniel Kahneman, describes the way that people remember events by the peak and the end of the experience. For instance, if I go to an amusement park, this heuristic says that I will remember my trip by the height of excitement and the way I felt when I left. The classic experiment showing this phenomenon is described by Mr. Schwartz:

Participants in a laboratory study were asked to listen to a pair of very loud, unpleasant noises played through their headphones. One noise lasted for eight seconds. The other lasted sixteen. The first eight seconds of the second noise were identical to the first noise, whereas the second eight seconds, while still loud and unpleasant, were not as loud. Later, the participants were told that they would have to listen to one of the noises again, but that they could choose which one. Clearly the second noise is worse–the unpleasantness lasted twice as long. Nonetheless, the overwhelming majority of people chose the second to be repeated.

These results are not limited to abstract, constructed experiences. Schwartz another experience with a little more real-world impact:

In the test, one group of patients had a standard colonoscopy. A second group had a standard colonoscopy plus. The “plus” was that after the actual examination was over, the doctor left the instrument in place for a short time… and it made a difference. It turned out that, over a five-year period after the exam, patients in the second group were more likely to comply with calls for follow-up colonoscopies than patients in the first group.

And of course, this example takes advantage of the colonsocopy rule: any research that deals with colonoscopies makes me uncomfortable, and therefore has more impact.

As I mentioned, since I discovered this rule, it keeps popping up in discussions I have been having. Having recently been on a vacation, it strikes me that this heuristic is of utmost importance in planning long events. It appears that the optimal planning for a vacation (or any event for that matter) would look something like this:

In the case of my vacation, the last high-point of my time in Europe was in Florence, followed by one brief day in Copenhagen. Not that there’s anything wrong with Denmark, but that day ends up coming up in more of my conversations than the rest of the trip because that is how memory works (that and blood jello is really, really disgusting). If you’re planning any trips soon, make sure to end on a high note, because you will be the one telling the stories.

Google news, meet spam

I’ve been a long-time user of Google news and news alerts. For certain topics, it’s the only way for me to stay informed, and the quality of their index has generally kept these updates to high-quality, on-topic news that matched some keywords. Over the past six months I have noticed a diminishing returns on the value of their search, especially in the case of alerts. While the amount of information has increased, the average quality has been diminishing. This decrease in relevance can be attributed to certain publications in their corpus:

Small publications: as more college newspapers, trade publications, and otherwise non-authoritative sources become primarily web-distributed, they have also started to overwhelm the news index. It’s rare these days to come across a story from a mass media publication.

PR announcements: some readers may remember a few months back when a 15-year old boy wrote a press release about how Google had hired him, and the entire affair turned out to be a hoax. Press releases seem to be a media that is not well policed, probably because they mainly come from

Blogs: The boundary between mass media and blogs has certainly blurred over the past few years, but the selection criteria for news indexes does not seem to follow any rules. Presumably the site maintainers take submissions to the site and decide based on internal editorial guidelines what to let in. Some of the blogs I have seen do not seem to make the cut, but maybe their inclusion of blog search into the interface suggests they are working on a better solution.

Syndication sites: a few news sources indexed by Google are actually sites that aggregate news from other sources. Try a search for any of your favorite spam keywords, such as “viagra,” you will find some surprising results. Spam?! It seemed absurd to me that spam could get into the news index, where every source was hand evaluated, but lo and behold, there are more than a few pages trying to sell viagra:

What each of these examples points to is the need for a ranking mechanism that takes into account the reputation of the source. At last count, the US version of news is indexing over 10k sources, and as this bar gets lower, our collective trust in this site becomes more and more important. Unlike web search, which can be indexed and updated over the course of months, the news index has to be extremely fresh; for this reason, algorithms like PageRank cannot function properly. Attention indicators like del.icio.us, Digg or Newsvine might help, but each of these sources comes with an inherent bias that might not reflect the audience of Google News.

It seems much more likely that the sources of news will become the harbingers of trust. I am not advocating a return to old media, but the index could be built to reflect the current opinion of the web at large. If most sites trust the New York Times or the Washington Post as an authoritative host, so could a news search index. Andy Baio did an experiment around host ranking using Metafilter as a source, and the results from 1999 to 2006 are quite interesting: many sites appear out of nowhere (Youtube, Wikipedia) while others maintain rank over the years (New York Times, BBC). My guess is that standard news results run through this filter would provide a substantially better experience, especially for ranking results within a given news cluster. I guess we’ll see what the big G ends up doing to rectify the situation.

Amazon launches answers site

Today I received an invite to join a new community at Amazon called Askville:

You’re Invited!

As a valued Amazon customer, you’ve been specially picked to get an early look at a new website called Askville where you can ask any question on any topic and get real answers from real people. It’s new, and best of all, it’s free!

This site will compete with Yahoo! Answers and Microsoft Q&A in the free question-answering space except that it might be able to leverage the Amazon community of experts. For those that have not been following this area, these systems enable knowledge creation by allowing users ask questions that are then answered by other users in exchange for reputation within the system. The first success in this space was a startup in Korea named Naver that took control of the search market share in a very short period of time.

Amazon’s system is similar to all of its American counterparts, with its large fonts and friendly messaging (“ask.. answer.. meet.. play”), except for a few subtle distinctions:

Users are rewarded for asking questions as well as answering them
Questions are limited to 5 answers total
Best answers are chosen by the group of question asker and answerers, where the asker gets one more vote than the answerers

Probably the most significant change is the flow of the question/answering exchange. In Yahoo! Answers, and elsewhere, answers are shown publicly as they are received; in Askville, answers are hidden to the public until 5 answers have been received. Any discussion or clarification can happen in a public message board attached to the question. After 5 answers have been collected, the group of asker and answerers vote and the whole thing is made public.

Askville rewards users with “coins,” a virtual currency that will be redeemable in another community named Questville slated for release in early 2007.

The system has given me 25 invitations for other accounts. If you’re interested in trying out the system, shoot me an email.

Update: I apologize, but all of my invitations have been distributed! It seems like the invitations are spreading though, so look for one on a weblog near you…

Firefox 2 inline spell checker

I usually avoid the initial release candidates of open source software, but Firefox just released their beta 2 candidate about a month ago. I finally got around to installing it this week and I have to say it’s not that mind-blowing. They’ve added cleaner RSS support, more intelligent tabs, and a number of features that mimic former plugins.

While I was test-driving this new toy I went to some new ajax-spiffy application and was completely blown away by their inline spell checker, until I realized that it’s a standard feature on the new Firefox:

Inline spell checking: A new built-in spell checker enables users to quickly check the spelling of text entered into Web forms (like this one) without having to use a seperate application.

It’s essentially the same spell checker that has existed in more serious writing applications (word processors, email clients, etc.) for years, with red dotted lines under misspelled words and right-click action to suggest correct spellings. It appears that the authors of the above quote were using a previous version of Firefox as the word “separate” is spelled incorrectly. Of course I only noticed this because it has red dots below it in my browser.

Most good social software currently support spell checking, but the inline version it isn’t the sort of task that can be done in real-time by a web app. It makes perfect sense that this sort of functionality would migrate to the browser, given the amount of general text editing that is happening now on the web. We’ve moved from a world of web browsing to one of web editing, and our tools for manipulating this environment will reflect this shift.

I can say that I’m pretty helpless without a spell checker, and I am usually too lazy to use online tools like spellcheck.net, so this will generally raise the bar of my online participation. Oh, and I won’t look like an idiot. As much.

Digging into Satisfaction

Quite a while back I reade a book called Satisfaction: the science of finding true fulfillment. The book is about the scientific escapades of its author, Gregory Berns, as he seeks the answer to a number of questions about happiness. The book varies from extremely technical descriptions of Berns’ research in neuroeconomics to extremely accessible stories and anecdotes that most people can related to. It’s a highly enjoyable read, and I recommend it to anyone who likes reading pop science.

I give a lot of love to authors that use footnotes. When someone can write a completely accessible book but still maintain depth by referencing all of the relevant literature in endnotes, they are a master communicator. Satisfaction has a number of interesting footnotes that I have intended to follow up on; as a service to myself I’m going to place them here as well.

Zen and the diffusion of links

Yesterday I had a moment I can only classify as Zen. Amidst the flurry of hundreds of RSS chunks, emails and IMs spreading between thousands of people, some signal seemed to appear out of the noise. Unfortunately, I am not omnicient, and cannot put the puzzle together completely. Perhaps someone can enlighten me.

1. Sometime in the AM, I follow a link from Nelson’s Linkblog to a story on a blog called Collision Detection about the limitations of multitasking. I find it interesting, so I post it to del.icio.us.

2. I follow a link from Jason to a nerd comic about sandwiches. I laugh, and this makes me a nerd. I do not want people to know this, so I refrain from posting to del.icio.us.

3. Kathryn sends me back to Collision Detection, this time for a story about Matmos sampling an Enigma machine for an upcoming song. This is too much of a coincidence, two links to Collision Detection in one day, so I do some research. Of course it turns out that this is the weblog of Clive Thompson, author for Wired, New York Times Magazine, et al. I add him to my RSS reader, of course.

4. Later in the evening, Clive posts about a funny t-shirt produced by Randall Monroe, the author of the aformentioned nerd comic. Ok, something is definitely amiss here.

Nelson → Clive, Kottke → Randall, Katheryn → Clive, Clive → Randall. This is too much for coincidence. Will someone please tell me what is going on?