A naked and helpless post

We are born into this world naked and helpless.

The same could be said for pages on the web today. Like the one you’re reading right now.

When a page is born, no one is pointing to it. The newborn page has no links. It has a PageRank value of exactly zero.

If it is an insightful and talented link, others may notice it and eventually link to it. It’s PageRank will grow. The proud papa will beam in admiration.

But this takes time. Time matters. If you’re looking for breaking news, if you’re looking for insights into what’s happening now…you won’t wait around for all those links to build up, like so many layers of sediment that will one day form the Grand Canyon.

This is why authority matters, this is why trust matters. In my (somehow very popular!) last post, I talked about Subscription Links and Identity Links as different from normal links. It’s about predicting the future, it’s about time. If I link to the the Long Tail blog in my blogroll (bottom right), I’m telling the world that I predict that future content from this source (Chris Anderson) will be worth reading. If you trust me, you should trust my recommendation.

So what you need is a system that understands these predictions. Once you can tell the future, you can give newborn pages, and all new content, some value as soon as they are born. Now they’re not so helpless. Suppose lots of people who care about knitting subscribe to Bob’s blog. Now if Bob writes a new post about knitting, we should expect/predict that will be quality knitting material. Even before the knitters of the world start to link to it.

This is one of the many uses for authority, for credibility. Thus my explorations into using blogrolls for ranking.

How have others tackled this problem?

One way is to ignore the problem and value pages purely on age. Which gives you basically a ton of splogs, which is what Technorati was being reduced to a couple years ago. (Collective Intellect CTO Tim Wolters told me that he figures a third of all blogs are splogs. Another third are political blogs, and the final third is the rect.)

Technorati stemmed the tide, somewhat, with their authority alorgoithm. It uses counts of past cross-links as a proxy for trust, a prediction of future worth from the same source. But this method has many problems, which a subject for another post.

Digg and Reddit and their many many clones are essentially mechanisms for a newborn page to rapidly acquire new links. A sort of time fast-forwarder. This is not about authority. (Except perhaps in the secret ways that some Digg users’ votes (links) may be worth more than others. More on this in a yet another post.)

Regular Google search mitigates this problem a little by treating domains as sources. E.g. lots of links to various pages of xyz.com lend more credibility to future pages created on xyz.com. I suspect this is why the big blogging platforms put each blog on its own subdomain. E.g. “wanderingstan.livejournal.com” instead of “livejournal.com/wanderingstan”. This is not a concept built into PageRank, but something added into the secret Google search recipe. (Which, come to think of it, is also fodder for another post!)

In summary, not all links are created equal, nor are all pages created equal.