Archive for the ‘peoplerank’ Category

A naked and helpless post

Monday, March 5th, 2007

We are born into this world naked and helpless.

The same could be said for pages on the web today. Like the one you’re reading right now.

When a page is born, no one is pointing to it. The newborn page has no links. It has a PageRank value of exactly zero.

If it is an insightful and talented link, others may notice it and eventually link to it. It’s PageRank will grow. The proud papa will beam in admiration.

But this takes time. Time matters. If you’re looking for breaking news, if you’re looking for insights into what’s happening now…you won’t wait around for all those links to build up, like so many layers of sediment that will one day form the Grand Canyon.

This is why authority matters, this is why trust matters. In my (somehow very popular!) last post, I talked about Subscription Links and Identity Links as different from normal links. It’s about predicting the future, it’s about time. If I link to the the Long Tail blog in my blogroll (bottom right), I’m telling the world that I predict that future content from this source (Chris Anderson) will be worth reading. If you trust me, you should trust my recommendation.

So what you need is a system that understands these predictions. Once you can tell the future, you can give newborn pages, and all new content, some value as soon as they are born. Now they’re not so helpless. Suppose lots of people who care about knitting subscribe to Bob’s blog. Now if Bob writes a new post about knitting, we should expect/predict that will be quality knitting material. Even before the knitters of the world start to link to it.

This is one of the many uses for authority, for credibility. Thus my explorations into using blogrolls for ranking.

How have others tackled this problem?

One way is to ignore the problem and value pages purely on age. Which gives you basically a ton of splogs, which is what Technorati was being reduced to a couple years ago. (Collective Intellect CTO Tim Wolters told me that he figures a third of all blogs are splogs. Another third are political blogs, and the final third is the rect.)

Technorati stemmed the tide, somewhat, with their authority alorgoithm. It uses counts of past cross-links as a proxy for trust, a prediction of future worth from the same source. But this method has many problems, which a subject for another post.

Digg and Reddit and their many many clones are essentially mechanisms for a newborn page to rapidly acquire new links. A sort of time fast-forwarder. This is not about authority. (Except perhaps in the secret ways that some Digg users’ votes (links) may be worth more than others. More on this in a yet another post.)

Regular Google search mitigates this problem a little by treating domains as sources. E.g. lots of links to various pages of xyz.com lend more credibility to future pages created on xyz.com. I suspect this is why the big blogging platforms put each blog on its own subdomain. E.g. “wanderingstan.livejournal.com” instead of “livejournal.com/wanderingstan”. This is not a concept built into PageRank, but something added into the secret Google search recipe. (Which, come to think of it, is also fodder for another post!)

In summary, not all links are created equal, nor are all pages created equal.

Prelimenary results from Blogroll Ranking

Monday, February 26th, 2007

Who are the influential bloggers? Which blogs matter? What metrics would you use to even begin to answer these questions?

I’ve been exploring alternate methods of ranking in the past months. The best results are coming from examining Blogrolls. When you think about it, blogrolls compromise the links in a huge implicit trust network. For now I’m calling the calculated score “PeopleRank”. It’s kinda like PageRank, in that blogroll links from higher PeopleRank-ed blogs count more. E.g. if Om Malik has you on his blogroll, that counts a lot more for your ranking than the blogroll of your niece on Livejournal. (No offense to your niece.)

So here are the top 50 blogs as ranked by the preliminary algorithm: (Commentary and caveats follow)

Blog Name URL People Rank Blogroll Count
TechCrunch (Arrington & Friends) http://www.techcrunch.com/ 16.88550 74
Fred Wilson http://www.avc.blogs.com 13.65663 59
Om Malik http://www.gigaom.com/ 10.90295 51
Subscribe to Posts (RSS) http://feeds.feedburner.com/ 10.35721 58
Battelle, John http://www.battellemedia.com/ 9.43316 36
kottke http://www.kottke.org/ 9.30745 23
Micro Persuasion http://www.micropersuasion.com/ 9.05083 35
dooce http://www.dooce.com/ 8.75597 24
CNNMoney.com http://money.cnn.com/ 8.24951 14
Advertise on this blog http://money.cnn.com/services/mediakit/ 8.24951 14
Creating Passionate Users http://headrush.typepad.com/creating_passionate_users/ 8.05627 51
Instapundit http://www.instapundit.com/ 8.01555 30
Brad Feld - Feld Thoughts http://www.feld.com/blog/ 7.76376 57
BuzzMachine http://www.buzzmachine.com/ 7.68799 31
Seth’s Blog http://sethgodin.typepad.com/seths_blog/ 7.64178 44
Full Content http://www.gizmodo.com/index.xml 7.39462 10
Comments http://www.gizmodo.com/xml/comments 7.39462 10
How to Change the World http://blog.guykawasaki.com/ 7.36782 39
Read/WriteWeb http://www.readwriteweb.com/ 7.32572 27
Canuckflack http://www.canuckflack.com/ 7.25962 11
Slashdot http://www.slashdot.org/ 7.22526 32
Gizmodo http://www.gizmodo.com/ 7.22314 19
Movable Type http://www.movabletype.org/ 6.92314 15
David Jones/PR Works http://www.prworks.ca/ 6.67162 11
GestureBank http://blogs.zdnet.com/ 6.61738 20
Hugh Macleod http://www.gapingvoid.com/ 6.58896 19
Michelle Malkin http://www.michellemalkin.com/ 6.53256 28
New World Notes http://secondlife.blogs.com 6.47961 6
Bad Astronomy http://www.badastronomy.com/ 6.34440 9
Talking Points Memo: by Joshua Micah Marshall http://www.talkingpointsmemo.com/ 6.30786 23
James Governor http://www.redmonk.com/jgovernor/ 6.11552 23
Three Kid Circus http://www.threekidcircus.com/threekidcircus/ 6.10842 109
Sweetney http://www.sweetney.com/ 6.08445 107
Rain City Real Estate Guide http://www.raincityguide.com/ 6.06087 11
Fussy http://www.fussy.org/ 6.00416 16
SpiffyJapan http://www.spiffyjapan.com/ 5.97301 5
Jottings By An Employer's Lawyer http://employerslawyer.blogspot.com 5.95257 7
VentureBlog http://www.ventureblog.com/ 5.91916 24
Joho the Blog http://www.hyperorg.com/blogger/ 5.85586 23
Jeneane Sessum - Allied http://allied.blogspot.com 5.73544 91
Her Bad Mother http://www.badladies.blogspot.com 5.73306 108
George’s Emplt http://www.employmentblawg.com/ 5.71551 7
B.L. Ochman's Weblog http://www.whatsnextblog.com/ 5.69226 11
Captain's Quarters http://www.captainsquartersblog.com/mt/ 5.65295 28
Techdirt (Mike Maznick) http://www.techdirt.com/ 5.64693 21
Venture Chronicles http://jeffnolan.com/wp/ 5.63134 33
This Blog Sits at the http://www.cultureby.com/trilogy/ 5.50986 9
Shel Holtz http://blog.holtz.com/ 5.49340 10

Caveats of this calculation:

  • Results with ~5K blogs crawled.
  • Blogroll Count = Number of blogrolls this blog appears on = How many people publicly admit to reading this blog.
  • The interesting datapoints are where the PeopleRank ordering puts a blog higher in the list than one with a higher blogroll count — those fewer subscribers must be “more important”.
  • This crawl took Lijit user blogs as the starting seeds giving an overall tech bias.
  • However, there was a period when the crawler went unchecked into what can only be called “The Mommy-o-sphere” so there is an over representation of Mom-blogs in teh dataset.
  • Our blogroll detector algorithm still gets false positives, thus the high rank for “Subscribe to Feedburner” and the multiple ColoradoStartups.com listings.
  • Some blogs use a Blogrolling widget for a “Web Ring” functionality, thus erroneously appearing as blogrolls. This explains most of the 100+ blogroll counts.
  • We need better de-duping. Several blogs appeared until multiple URL’s, reducing the overall score.

So how is this different from existing rankings? Til now, the most common methods have fallen into one of two camps:

  1. Number of subscribers. I.e. a pure democracy. Use some combination of Feedburner (for RSS readers) and some web analytics (for web readers) to count the raw number of people reading a blog.
  2. Raw number of incoming links (citations). This is similar, except that links are counted instead of subscribers.

Note that neither method discriminates between the blogs “casting the votes”. It doesn’t matter if that 24th reader of your blog happens to be Scoble. Nor does it matter if those 3 citations to your blog in the last month (Technorati defines this as “very low authority”) came from Seth Godin, Fred Wilson, and Guy Kawasaki.

Initial results are encouraging, and I hope to do more analysis this week. What do you think? If you have any suggestions or ideas, please get in touch with me.