I am obviously always on the lookout for weblog statistics, as it has become a core part of my thesis. Today a marketing company by the name of comScore has released a report detailing a number of different statements about the weblog community. I’d like to take a moment to remind people that this is a marketing survey, and as such should be carefully scrutinized before drawing any conclusions.
First, comScore’s methodology claims that they have 2 million active subjects, recruited through Random Digit Dial and an “online recruitment program,” for which they provide no details. They do however list the incentives that are provided to those individuals:
- Server-based virus protection
- Attractive sweepstakes prizes
- Opportunity to impact and improve the Internet
Sans the third incentive which is the blanket “feel-good” incentive for all surveys, I challenge you to think of someone who is attracted to the first two. Let’s just say they’re not your average person or internet user. They also note:
All demographic segments of the online population are represented in the comScore Global Network, with large samples of participants in each segment. For example, our network includes hundreds of thousands of high-income Internet users – one of the most desirable and influential groups to measure, yet also one of the most difficult to recruit.
Without diving into what “high-income Internet users” are, having hundreds of thousands subjects from a assumedly small portion of the population leads me to believe that they’re not really interested in representivity, but rather, umm, marketing. Given that they do not justify their sample, nor provide margins of error, the initial sampling frame should be considered bunk.
Second, if their sampling of weblogs seems strange at first, it is. They were interested in how the aforementioned sample visited weblogs, so they decided to look at visits to 400 blog-related domains, which they culled from “top blog lists.” These domains include hosting services (e.g. “*.livejournal.com”) among the other top blogs. Keep in mind that this sample of 400 domains incorperates community sites (freerepublic.com, fark.com, slashdot.org, metafilter.com, etc.), professionally written sites (gawker.com, drudgereport.com, fleshbot.com, etc) and potentially spam (crazyass13.com throws my spam alarm).
I’m assuming, based on their distribution of unique visitors shown below, that all of these sites are included in one sample, with the top sites being blog hosts (although note the missing blogspot, which supposedly saw 19 million unique visitors), and the second group being community sites and professional blogs. As far as many people might be concerned, the “real blogs” start around #30, for which they provide no description. How this is a sample of weblogs at all, I can’t say. But building categories around this strange set of sites seems a little unsound.
What this report, in sum, seems to say to me is that some large number of people have visited either a professional weblog or some weblog on any number of the hosted services in the past year. This should not be surprising. I get a blog site response from Google just about once every five queries. Without any description of how many of these blog visitors saw only one blog in the entire period, I’d say an overwhelming majority could be from search engines (which they admit).
Given their sampling frame and blog selection methodology, it seems hard to extrapolate any meaningful statistics about true blog readership. Until they release the data, I would quote these numbers with extreme caution.