This week I’ll be presenting a paper at the International Communication Association Conference in New Orleans titled Audience, Structure and Authority in the Weblog Community. The paper is an analysis of two different metrics for measuring authority within weblogs:
- Blogroll: A link from one weblog to the top-level of another, (e.g., links to http://overstated.net, http://www.overstated.net or https://overstated.net/index.asp). I assume this is a proxy to popularity.
- Permalink: Any link from one weblog to deep content on another (e.g. a link to https://overstated.net/04/05/24-weblogs-and-authority.asp). I assume this is a proxy to influence.
The following table shows the top 20 for each measure. One observation is that many of the top ranked sites are community weblogs (e.g. Slashdot or Memepool). These sites play the important role of hubs, maintaining ties to more weblogs than a single person would be able to. They allow information to diffuse quickly between distant parts of the network of readership.
|Blogroll Degree Rank||Permalink Degree Rank|
A second observation is that the lists are fairly distinct. While some webloggers hold top positions in both ranks, the list diverges considerably as the position increases. While Blogrolls tend to support the weblog elders (scripting.com, evhead.com, etc.), permalinks suggest a different set of authors as influencers (joi.ito.com, buzzmachine.com, etc.). Looking at the differential between the ranks in the figure below, it is apparent that as soon as the rank passes 100, the correlation between Blogroll and Permalink rank becomes less defined.
Permalink and Blogroll rank differential
This raises new light to the age-old weblog power law debate. While the blogroll rankings (reflected by Shirky’s original analysis) suggest a model of preferential attachment, many of those weblogs listed in the top permalink ranks are much younger. If the weblog social structure is mitigated by a law of the “rich getting richer,” we would expect older weblogs to have more influence, and hence more links to their entries.
There are obviously many caveats and details, all of which are listed in the full paper below. Since I’m presenting it this coming Friday, I’d appreciate any feedback you may have.
Full paper: Audience, Structure and Authority in the Weblog Community (pdf 228k)
42 thoughts on “Weblogs and authority”
I just realized an obvious pice of background that should be included in the background section: prior work done in coauthorship within the science citation index. Coauthorship could be thought of as a parallel to blogrolls (a strong form of affiliation) while permalinks are analogous to citations. I’ll add that tomorrow.
Am I being an idiot for not understanding precisely the difference between ‘blogroll link’ and ‘permalink’ here? Is a permalink a link within the body of a weblog post? If so, wouldn’t that be more ephemeral, and less ‘perma-‘?
This is really interesting in a meta-way (that metabloggery people get all twisted up over, myself included, sometimes) and may illuminate some of the fuzzy discomfort I had with Clay’s power law essay, but I’m unclear on the terminology.
But then maybe I ought to read the pdf first, huh?
Having read the pdf, I’m getting that a blogroll link is an incoming link *from* a blogroll to a top-level site URL, and a permalink is an incoming link *to* a (deeper) permanent post URL (or what I’ve always thought of as a permalink).
What about links from posts (ephemeral) to top-level site URLs? Probably not possible to parse from the data, I guess.
Not sure if the confusion was just me being thick or if you’re using the words in a slightly different way than the one I’m accustomed to.
(You’d figure after all this time (21 blog years?) I’d be able to grasp this stuff a little more quickly…)
Blogroll links are any link to a top-level domain while permalinks are links to deeper content. Both originate from the front page of the source weblog, found either in the content of the posts or on the sidebar. Each link has a source and a destination (i.e. from and to).
That said, the table above lists the in-degree for the given weblogs, a measure of the total number of webloggers linking to these sites. Graphs are always hard to explain for some reason, I think because the simple terminology we use is a little bit inadequate.
Nice paper. I’m not very sure about your definition of blogroll link, but I feel that this is a matter of name and not of the real intention. Myself, posting about your paper (Yo enlazo, tu enlazas, él enlaza , in Spanish), I’ve put a link to your main site, and you are not in my blogrolling (well, you should be, if I had one 😉 ).
I would also like to point you to our papers that deal with similar topics as yours, in Do we live in a Small World? Measuring the Spanish-speaking blogosphere (pdf), and Measuring the Spanish Blogosphere. Finally, The Spanish-speaking Blogosphere: towards the powerlaw?.
Very interesting abstract, Cam. I’m looking forward to reading the full paper.
One analogy you might find useful when presenting to academics and/or librarians is that (if I understand you correctly) Blogroll represents a pointer to a resource, while Permalink represents a pointer to an “article” within that resource. The terms Publication and Citation might also be handy.
Interesting article that touches on a number of themes I’ve explored in the past. Couple of things:
1) You equate Permalinks to Influence. I agree, and would add that Permalink rank is a better measurement of Quality of Ideas. Blogroll rank is more a measurement of popularity, rather than influence/ideas – as your article explores.
2) Not sure if you’ve come across my Fractal Blogosphere post in your travels, but it was my antidote to Shirky’s Power Law. Interestingly Radio Free Blogistan linked to it today and then related it to your post.
Nice paper, thanks. I think Clay’s central point still holds true, though. Although we may not know all of the factors influencing connectiviy within the blog network, those factors are algorithmically reducible & not subject to debate or change without affecting other aspects of the network, in ways you may not want. You turn this knob & that other one turns too, in a different direction.
Incidentally, I just ran across a paper about a new method for obtaining powerlaws in a system, Highly Optimized Tolerance. And if anybody’s interested in papers on Network Theory & related subjects, I’m running a periodic series of posts of the latest papers from arXiv.org on my (shameless plug) new blog.
I am curious what relationships then also exist between trackbacks and permalink ratios. Taking the concept you defined regarding permalinks a step further, trackbacks not only indicate a reference to a permanent link, but also a discussion that adds significance to the influence of the posting.
(Amusingly enough, it seems trackbacks actuall fall into a logarythmic line of best fit, but I need to finish tracking and compiling the information before I make any assertions.)
Something is wrong with your methodology.
If you read the discussion section of the paper I discuss the caveats of the current Blogdex data. One of the shortcomings is the fact that it’s an opt-in service, and while I move over to an opt-out system I have stopped added weblogs. This accounts for both of the cases above (dailykos was added on 4-15-04 for instance).
I’m not sure about the permalinks=influence part. Many of the most popular blogs may not have much original content worth reading, but they may be better filters and aggregators than other blogs. If a site gets a lot of visits because it posts good links, and the sites it links get permalinked a lot as a result, I’d argue it has a lot of influence (even though the site itself wouldn’t get a lot of permalinks).
There is a danger in considering both blogrolls and inline deep links as face value equivalent — in my experience, many blogrolls grow larger over time, very few grow smaller, ie, they are very rarely pruned and do not necessarily reflect the current association of the target blog to those cited in the blogroll.
To evaluate the relevence of blogroll data, we need to seperate those links still used by the blog owner as distinct from those which persist in the blogroll because of an inertia, because they pay homage to the blogger’s history, or because is may be cumbersome and inconvenient to remove them.
By contrast, inline lines are by-definition fresh and current, an indication of where the blogger is gaining their material at the present time. Inline links move with the blogger’s growth.
If blog connections are not merely a reflex network but are more like cultural fashion and custom, we might expect a migration to newer fresher material as people become bored by their first-tastes and become more seasoned and experienced in finding and evaluating web data. Were there some way to obtain the blogroll change data (internet archive?) I expect we would find that those links most recently added to blogrolls will tend to be in that same class of newer, less-known blogs, just as with the blogs cited inline in the postings.
Sorry it took a bit for me to get to this–catching up on things… I hope it’s OK this comes here instead of email: a bit bloggier that way. You probably won’t see these until post-ICA anyway, so enjoy the Big Easy.
On page 5, you note that the falloff for blogroll links is steeper than that for permalinks. This is true in this data mainly because it is accumulated over a long period of time. I am reasonably certain that as you reduce the time slice, that fall off will increase exponentially. E.g., today, everyone is linking to whoever has the neatest take on the Ashcroft press conference, and I would expect that you see a greater concentration of linking because of that. Over weeks and months, this is going to thin out because of turn-over. The “instantaneous” links among blogs are likely, I expect, to see a *greater* concentration on the permalink side. In other words, that comparison is highly dependent on the chunk of time you are looking at.
I don’t know (I guess I could look), but I suspect that citations from the news media are in stories that are *about* blogs in general or these specific blogs. That is, I doubt they are citing these blogs as the source of other information. As such, it strikes me that they should correspond more closely to the blogrolls than to the permalinks. A quick glance confirms that this appears to be the case. As such, it seems as though blogrolls fulfill a similar function for bloggers: i.e., “this is what blogging is to me.”
Or maybe it is a bit like a blogging version of the “Twenty Statement Test”–a way of establishing identity through your blogroll. “I am a person who reads Slashdot.”
Overall, as I’m sure you can guess, I like the teasing out of the two types of links. I don’t know to what degree this can be taken further. Clearly, extending to comments and trackback would be worthwhile. Also, I wonder if there is an evolution of the blogroll. I suppose there are two possibilities. One is that your blogroll the person who introduced you to blogging. The other is that you populate your blogroll with the A-list since those are the first you are likely to see, and as time goes on you draw out those that are less widely read. As a result, any instantaneous look at the blogrolls would likely have the A-listers on top (since there are continually more new blogs than old, at this point), but a look at *new* additions to blogrolls might not follow the same pattern, especially among established blogs. In any event, it would be interesting to see if there is a general development of the blogroll, and what that looks like.
Interesting work here.
Argh. I just noticed that I recapped MrG’s comments quite a bit. Teach me not to read the comments first :).
Whenever the blogerati talk about “authority,” I tend to find their definitions of that notion counterintuitive and rather strange. Given your focus on what you call “opinion makers,” it seems it’s meant to serve as a measure of something like “influence” … that intangible capital possessed by those bloggers whose views and choices affect the opinions and reading habits of others.
But that notion involves a very complex relationship between reader and text. To take an extreme example, I may read and link to an Andrew Sullivan post from time to time, thus nominally increasing his permalink “authority,” but in doing so I may invariably strive to point out what an ill-informed blowhard and poseur the man is, completely lacking in “authority” in the sense of a deserved reputation as a writer whose judgements can be relied up to be well-informed and clearly thought out or whose information is relevant and valuable. Many blogs about the Iraq conflict have much more link-authority than Juan Coles’ blog, for example, but Juan speaks Arabic fluently, has a doctorate in Middle Eastern history, has been invited to testify before Congress, and has reliably predicted a number of developments. He is an “authority” in the sense in which we normally understand that term. Lately he has become more of an authority in the sense you use the term, thanks to help from The Agonist, but one could cite many cases where people who have very little knowledge of a given topic nevertheless dominate conversation about it. That the marketplace of ideas does not invariably select for excellence is an understatement.
From another angle, what I’m saying is that I may blogroll or permalink for many reasons other than to say “I agree with this person” or “this blog is worth reading.” I blogroll Islamist Web sites and the blogs reflecting the views of the extreme European left, for example, because these interest me. But I do not endorse their views or regard them as “authoritative” in the sense that they influence or represent my settled opinions. They authoritatively represent the viewpoints of the people who write them, that’s it.
My basic point is that a structural definition of “authority” in this sense is no better than a Nielsen rating: Yes, my set was tuned to C-SPAN yesterday from four to six, and to the Simpsons from 6:30 to 7:30, after which a porn PPV. But why? And with what effect on my opinions and behavior? This abstract way of framing the question tends to treat the blogosphere as though it were a broadcast network (like a TV network or newspaper distribution network) rather than an interactive networked medium. It is no more sophisticated or useful than the assumptions underlying the economic theory of consumption: approving of a product is equivalent to having purchased it and vice versa … until the better mousetrap comes along and you realize what you’ve been missing. Network theory can delineate possibilities of relationship among multiple audiences and content producers, but has nothing to say about the dynamics of that relationship.
And indeed, if you think about it, to what content do “opinion leaders” in the blogosphere, with their supposed power to set alternative agenda for public discussion, tend to point us? To news and opinion from the mainstream media. Just look at the top links from Daypop, Blogdex, and Technorati, for example. Post, Times, Guardian, ABC, BBC, CNN, CS Monitor, Drudge, Wired, … It seems to me that mainstream media in large part sets the agenda for conversation in the blogosphere. Reading blogs is not dissimilar from watching television through a window … there’s a little refraction but the picture is not much different.
The blogsphere is after all only a tiny parasitic lamprey on the great white shark of our global media ecology. (Generously estimated, it’s 2.5 million blogs vs. 275 million television sets. 4% of Internet users read blogs. You do the math.) I wonder if it could be shown empirically that high-traffic blogs tend to link more to mainstream medium than other types, for instance? To what extent do blogs drive traffic, for instance, to the NY Times Web site? To a significant enough extent that they could influence the kind of stories the Times covers, the way that Nielsen ratings determine the fate of sitcoms?
Thanks for posting this paper on your blog, Cameron. I’m enjoying your presentation in real time. 🙂
I think you may need to filter for machines – blog aggregators and the like. A lot of blog services like to consider themselves blogs, and effectively force all their users to link ’em. Look at the top 100 Blogshares – not a blog, properly considered, among ’em.
HCL ISD is known as the pioneer of Offshore Remote Infrastructure Management market in India and is today the leader in this field. The National Stock Exchange (NSE), our first IMS deal, involves managing the IT Infrastructure for Asiaâ€™s largest stock exchange, has instilled a strong DNA of â€œReal Time Operationsâ€ in HCL ISDâ€™s ethos.
Hi! I was surfing and found your blog post… nice! I love your blog. 🙂 Cheers! Sandra. R.
I love your site. 🙂 Love design!!! I just came across your blog and wanted to say that I
Sign: umsun Hello!!! rcuwwymhyw and 1358ssgfhphzye and 5997I will try to recommend this post to my friends and family, cuz its really helpful.
Sign: wdpad Hello!!! mpbzq and 3384gvegycempy and 3280 : Thanks. We look forward to hearing from you again and for your opinions on the world of work.
Good post, will frequent your site.