Weblogs and churn rate
The first question that every journalist asks about weblogs—how many are there— has been a source of constant debate over the past year. I was cited in the Economist with the number 500,000, which prompted a response, as well as a number of new efforts for estimating this number:
Blogcensus is a funded project crawling and classifying content as weblog or not weblog. Pages identified as weblogs are then categorized by their native language using simple heuristics. This project is the sole work of Maciej Ceglowski
Blogcount is a self-proclaimed aggregator of other data sources. The site is making press releases based on the management reports of centrally hosted weblogs/journals (i.e. Blogger, LiveJournal, etc.). Using the data collected by Blogcensus, the original numbers are adjusted to account for international and non-hosted entities.
A word to the wise: online communities can appear much more active than they actually are (and I’ve got some data to show it!).
In all user communitiies, not every individual that signs up for a service will use it indefinitely. When trying to gauge the effect a service is having on society at large, the important number is not those that have tried it, but rather those that use it regularly. The churn rate is the percentage of users that cancel a service, i.e. webloggers that will never post again (despite the fact that their weblog might remain online).
Any user community that is growing exponentially sees a large percentage of the population added in a short time frame. Let’s assume for example that the number of weblogs on Blogger doubles every year, and that an “active” weblog is one that has been used in the past 6 months. Given the numbers reported by Blogcount, namely 705k active and 1500k total weblogs, and given a standard model of exponential growth, we can assume that the population of Blogger 6 months ago was around 1060k, and that 440k weblogs have been added in the past 6 months. This means that 62% of the active weblogs could have only one test post on them.

Above is a chart of this phenomenon over time. The active user population is assume to be half of the total, while the core users are those that have been active for more than 6 months. Note that the core user population does grow exponentially, albeit much more slowly than the total population.
This is just a simple example to show that much better statistics are needed to calculate the true active population of the weblog community. Most of all, we need more information about the users of centrally maintained weblog services. What is the distribution of use over the active period? My assumption, based on my previous research with prior communities, is that most of these users are experimenting with the system, and not among the core group.
In other words, I’m recinding my original estimate of 500k. I now think that the number is much, much smaller.

