The first question that every journalist asks about weblogs—how many are there— has been a source of constant debate over the past year. I was cited in the Economist with the number 500,000, which prompted a response, as well as a number of new efforts for estimating this number:
Blogcensus is a funded project crawling and classifying content as weblog or not weblog. Pages identified as weblogs are then categorized by their native language using simple heuristics. This project is the sole work of Maciej Ceglowski
Blogcount is a self-proclaimed aggregator of other data sources. The site is making press releases based on the management reports of centrally hosted weblogs/journals (i.e. Blogger, LiveJournal, etc.). Using the data collected by Blogcensus, the original numbers are adjusted to account for international and non-hosted entities.
A word to the wise: online communities can appear much more active than they actually are (and I’ve got some data to show it!).
In all user communitiies, not every individual that signs up for a service will use it indefinitely. When trying to gauge the effect a service is having on society at large, the important number is not those that have tried it, but rather those that use it regularly. The churn rate is the percentage of users that cancel a service, i.e. webloggers that will never post again (despite the fact that their weblog might remain online).
Any user community that is growing exponentially sees a large percentage of the population added in a short time frame. Let’s assume for example that the number of weblogs on Blogger doubles every year, and that an “active” weblog is one that has been used in the past 6 months. Given the numbers reported by Blogcount, namely 705k active and 1500k total weblogs, and given a standard model of exponential growth, we can assume that the population of Blogger 6 months ago was around 1060k, and that 440k weblogs have been added in the past 6 months. This means that 62% of the active weblogs could have only one test post on them.
Above is a chart of this phenomenon over time. The active user population is assume to be half of the total, while the core users are those that have been active for more than 6 months. Note that the core user population does grow exponentially, albeit much more slowly than the total population.
This is just a simple example to show that much better statistics are needed to calculate the true active population of the weblog community. Most of all, we need more information about the users of centrally maintained weblog services. What is the distribution of use over the active period? My assumption, based on my previous research with prior communities, is that most of these users are experimenting with the system, and not among the core group.
In other words, I’m recinding my original estimate of 500k. I now think that the number is much, much smaller.
Spot on. I have 500+ members on the Web forum I run, and the “core” is probably 25 people, and the “actives” probably number no more than 100.
GFM
Wait, what number are you rescinding? That there are 500k weblogs? Or that there are 500k active weblogs? I would seem to me that tools like Blog Census are counting mostly “active” weblogs since non-active ones a) don’t appear in weblog update systems like weblogs.com, b) don’t appear in people’s blogrolls (you don’t link to someone who’s not updating), and c) don’t get linked to by other bloggers. So if you’re finding and counting weblogs based on following the link trails from a starting pool of blogs, wouldn’t you tend to find the active ones? (Assuming that you’ll get a small number of new blogs in your count that will die out).
I was pushing for upper bounds when I said 500k, but mostly because I didn’t think that the quote would be taken out of context. Now I’m recinding and replacing it with a lower-bound 😉
Blogcensus is crawling the web at large, but not paying explicit attention to when these sites were last updated. To get an accurate measurement, you’d need to sample on a regular basis and see how many were being added, going fallow, and disappearing in order to be accurate.
If we assume that weblogs do follow a power-law distribution, then many of the weblogs we see on a blogroll will appear active simply because they’re the same weblogs on every blogroll. But how do we account for the majority of sites that have only one link to them in the entire community? It would seem to me that these are the ones that have long since gone extinct.
Based on a quick and dirty random sample from the Blog Census data, about 64% (84/131) of blogs in that collection have been updated in the past three months or so. I suspect the real number of active blogs in the census database is lower because of duplicates.
That’s kind of a shockingly low figure, given that so much of the URL collection comes from update sites like weblogs.com, which by definition are populated by active blogs only.
Cameron on weblog churn rates:
Ye olde dynamic model. I think of this as an inventory problem. Items come in, they have some lifecycle, then they leave. That’s why the count of everyone alive is not the same as a count of everyone who’s ever lived. In June’s The Blogcount Estimate, I showed both of those numbers but the bottom line numbers were active weblogs. What is an active blog? Define “active” as doing something with your blog. Then define “abandonment window.” How many days since the last login? the last post? the last template or setting change? It’s convenient to say 30 days without activity is abandonment, but some folks say 45 or 90 days. There’s also the question of “what’s a blog”? Cam, do you include LiveJournal and DiaryLand in your estimate? How about moblogs? There are three sources of churn: exodus (leaving the blogosphere altogether), migration (leaving one blog for another), and long holidays (breaks in blogging that exceed an “actively blogging” window). Exodus. No-show for 30 days. Or deletion of the blog. Easy to measure if your run the server. Since people can log in and post without publishing (saving posts for the future), some activity showing up as active may be invisible to crawling. Migration is harder. Some blogs are off the map, neither operated by a known host nor linked-to by a known blog or crawler. So you may have a Migration taken for an Exodus. See this when people move their blogs off of Google’s Blogspot to their own domains. Holidays. In the U.S. vacations rarely last a full two weeks. Four to six weeks are more common in Europe and Japan. This can be mistaken for abandonment. The same thing may be true for sustained technical outages. Cam, if you have an updated estimate, especially with your assumptions and calculations spelled out, it would be my pleasure to post it. Lots of ways to peel an orange.
Rebecca Blood posits that we’ll start seeing bloggers who post monthly. Sort of an end of month wrap-up.
I think the questions Phil is asking are the right ones. Namely, before one can come up with an accurate estimate of how many active weblogs there are, we first need to reach consensus on what weblog and active mean.
Both are fuzzy, subjective categories. I’ll take a stab at active. A weblog updated once every year probably shouldn’t be included in our count. What about six months? Three? It’s a sorietes paradox if we try and arbitrarily choose a value. But, I think if we look at the distribution of update frequencies across all weblogs, the cutoff point will be clear. If there isn’t a marked dropoff point, we can just do two standard deviations from the mean.
So who can operationalize weblog?
Interesting. I wonder if it will ever be possible to come up with anything approaching an accurate blog census. This morning I commented on this – http://cyberatlas.internet.com/big_picture/applications/article/0,,1301_2238831,00.html –
which states that there are “roughly 2.4 million to 2.9 million active Weblogs as of June 2003. ”
On defining weblogs: I’ve been told that mine is “not really a weblog” because it’s hand coded. Does that matter? It looks like a weblog; it has posts in reverse chronological order; I update it almost every day; I’m listed on Technorati, Weblogs.com and a few others. I started out on Blogger, switched to Moveable Type then got disgusted with all the bugs and decided the only solution was to give up on all prefab weblog tools and figure out HTML. I’m basically doing the same thing I did with Blogger and MT so why is it not a weblog. On the other hand, I’ve seen a couple of bloggers who insist that their sites are NOT blogs even though they clearly ARE. So what is a blog? How the heck can anyone ever figure that one out?
Inspired by this post, I did a little more sampling and posted some of the results on the Blog Census site. In particular, I took a look at the distribution of update times for the blogs in that collection; it might help us pick a good definition of “active” for weblogs.
That’s great stuff, Maciej. A significant problem is that end users seem to have little motivation to close their unused accounts on free blog hosting services. Phil Wolff’s estimate included data from three large blogging services (LiveJournal, Blogger and DiaryLand) that include huge numbers of inactive blogs. I’ve expanded on this a bit in a post over at Blog Hosting News. I’m not sure we can ever effectively define an active vs. inactive blog, but the data discussed here will help quantify the percentage of idle/expired blogs.
I disagree with most of what you have been saying for the past few months. The following diagram demonstrates my point:
D
I *
S *
A *
G * ******
R *
E *
E * *
M *
E **
N
T *
S
—
TIME
(that dip was due to network outages at my work)
I will be more vocal about my disagreements in the future.
Crap. when will the world standardize on fixed width fonts.
That graph is totally unreadable.
I use Diaryland, a community that (I have read more than once) has over 1m users. I don’t think so, or at least the vast majority aren’t active.
Why? Because my own blog at Diaryland comes up in the top ten Google searches for “Diaryland”. I doubt this would happen if the competition were as stiff as we’re led to believe.
I write more about this on my mediajunk blog…
p.s. I agree with Ali: the graph could do with a good usability review.
That is some amazing info!! I thought Blogs was some pointless-hype to promote who knows what. Its not like this is any new technology, but I guess the numbers do speak.
Who knows, I may just have to find a use for Blogs on my web site ChurchRecordings.com
Thanks for the info.
Blogs are of great popularity now. Nobody can deny the fact that they can help and be useful in searching for some information.
.) There are certainly a lot of details like that to take into consideration. That is a great point to bring up. I offer the thoughts above as general inspiration but clearly there are questions like the one you bring up where the most important thing will be working in honest good faith. I don?t know if best practices have emerged around things like that, but I am sure that your job is clearly identified as a fair game. Both boys and girls feel the impact of just a moment’s pleasure, for the rest of their lives.
Hello!
I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.
I’ve said that least 3796234 times. The problem this like that is they are just too compilcated for the average bird, if you know what I mean
The ULTIMATE Weight Loss Surgery Resource