Archive for the ‘google’ Category

Google opens Social Graph API

Friday, February 1st, 2008

It’s long overdue.

Almost 4 years ago I started to get excited about what could be done with a social graph (or network, as we called ‘em back then). My head swam with possibilities of a real trust network: get product and company reviews , prevent spyware and check validity of files, control process execution, and of course use your network for trusted searching. That last one let to the development of Lijit, which was been my life for the last 3 years.

I never imagined how hard it would be to actually get a graph. The “big guys” of MySpace and Facebook sealed their users’ graphs in TOS-protected Silos, and users grew wary of re-friending on every new web service. Open standards like FOAF and XFN were there, but no one really used them. It was beginning to look like social graph innovation would be limited to whatever the big guys wanted to allow.

So I’m excited about Google’s new Social Graph API. There’s a still a long way to go, but maybe with Google’s weight other services will allow users to publish their graphs and be available to this API.

Google went big by using the information in the Web’s link-graph. What exciting new tools will be possible when we have real access to the social graph?

Related: Not all links are created equal

Facebook and Google add user feedback on same day

Thursday, November 29th, 2007

In a strange bit of synchronicity, Google and Facebook both rolled out similar features within 24 hours of each other. Both will now allow users to give feedback on items: In Google’s search results (via Googlified), and Facebook’s newsfeed.

Here’s Google:

And here’s Facebook:

In Google’s case, this is especially interesting as Google had been dismissive of “Social Search” in the past.

In August, I attended a talk by Marissa Mayer, Google’s leading executive on search, who said Google has worked on social search. However, she was somewhat dismissive of the opportunity. She said social search hadn’t shown much promise, but that if someone were to prove its worth, Google would be in a good position to incorporate it. (from VentureBeat )

Although to be fair, this isn’t strictly “social search” in the sense used by Swicki or the former Wink.com search application. In those applications the votes of everyone contributed to ranking of results. Your votes in the Google project apparently only affects your results…for now. (Lijit’s approach is different, being based on the social graph. See my earlier post about Social Search: Democracy or Network?)

Costs and Transparency in Ranking Systems

Monday, March 12th, 2007

Can a ranking system be transparent, inclusive, and successful? That was the topic of a long conversation last week with Lijit’s senior developer Derek Greentree. We kept coming back to questions about transparency and the cost of acquiring votes. And in the end we decided that this is some sort of rule:

The maximum success possible for a system is a function of the transparency of the algorithms and the cost of acquiring votes.

Consider this rough chart:

System Transparency Cost/Exclusivity
Online
Digg Low Low - Pass a CAPTCHA.
Google Low Lower - Create a web page with links.
SomethingAwful.com Forums High Med - $10 cover charge
Offline
Political Democracy High High - Become a citizen.
Academy Awards High. High - Become a member of the academy.
American Idol High Med - Cost of Text message

I hear you asking, “Why do Digg and Google get “Low” marks for transparency?”

Digg ranks news stories by the number of members which vote for (”digg”) each candidate. It’s pretty much a pure democracy, with an added time component: old articles are worth less. On the other hand, Google ranks pages by a more complicated algorithm known as PageRank, which treats links on web pages as “votes” for other pages and some pages’ votes worth more than others. It’s a bit like the electoral college, with an added semantic component: pages not related to the search query are worth less.

Do those descriptions sound about right? The thing is, neither is true these days. PageRank is now only one small ingredient of a page’s search ranking. Anyone who pays attention to their page in search listings is familiar with the “Google Dance” when ranking can change unpredictably and sometimes unfairly. Google has become a black box. Digg’s newfound popularity has it struggling to deal with spammers, and has also begun to shroud its algorithms in secret. The most recent Wired magazine, has an article “Herding the Mob” quotes Digg founder Kevin Rose as saying there are antihacking techniques that he can’t talk about. 

Jay and Kevin said they couldn’t explicitly detail how Digg’s ranking algorithm works because it would be used by those who want to game the system (the aiding the enemy defense is popular these days), but they gave enough information to understand the basics of how Digg’s version of a democracy works.

So what we see is that these two popular online ranking systems began with public algorithms, but have retreated into secrecy. 

woman showing finger after iraq voteOn the other hand, systems like the US election process remain part of the public record. Of course, in a democracy it costs a lot to get a vote. For one thing, you have to be born. And if you really want to cheat, you have to mess around with getting the ID’s of dead people and other very messy activities. In the recent Iraq elections the took the extra measure of dipping each voter’s finger in permanent ink to prevent double voting. 

Is this trend necessary? What are the underlying principles?

The trend seems to be that to thwart spammer in popular systems, transparency must go down or cost must go up. And in the online world, costs are dropping so low that transparency is being forced down as well.

The web has seen a lot of systems that begin with low costs and high transparency. That’s the very definition of openness. But as the systems experience success, they have 3 choices:

  • Raise the costs. E.g. SomethingAwful.com added a $10 cover charge to participate in voting. Metafilter added a $5 cover charge.
  • Obscure the algorithms. E.g. Digg adding secret “anti-gaming” algorithms
  • Become irrelevant. E.g. Usenet forums overrun with spammers

The most popular choice seems to be obscuring the algorithms.

Should we be alarmed at this? Imagine if the US government took the same approach: they will tell us who won the election, but the exact algorithm used to determine the winner can’t be revealed! One can argue that getting on the front page of a Google search or the front page of Digg is not nearly as important as an election. But the value of such positioning is only increasing in value, and the bad guys are already trying to rig these elections!

I would argue that low transparency is a form of editing. When Digg or Google says that they must keep their algorithms secret, they are in effect saying “Our algorithms are fair, but we can’t tell them to you. You can trust us.” But do we really trust them? Should we? If some quirk of Google’s algorithms somehow helps a company they have a partnership with, how motivated will they be to fix it? 

Anyways, those are some beginning thoughts on the subject. Any ideas from you would be appreciated, as I feel there is a lot more to explore here.

Google Sync sucks

Wednesday, January 17th, 2007

Just tried the Google Sync Firefox extension and have to say that it’s piece of crap. It was nice to have all my bookmarks synced to my home computer, but this morning when I fired up the work computer I was aghast to realized I had (1) no bookmarks (2) no cookies (3) no saved passwords. Worst of all, it didn’t make any backups of my original state so all may be lost.

This is a catastrophe. Some of those sites I may never get logged into again.

Good idea, shitty product.

[Sorry for the rant. But it's gonna take me weeks to undo this damage!]

Our experience with Google Custom Search

Sunday, January 7th, 2007


Google Custom Search is cool. And it’s a natural step for Google to distribute their search technology (dare I say “longtail-ize”?) in the same way that they distributed their ad technology when they expanded Adwords (on their domain) into Adsense (on anyone’s page). So it was a natural fit for us to use it as the backend for our Lijit Personal Network Search, and we’ve been happy with the initial results.

But it’s not perfect.

Ethan Zuckerman wrote about problems with Co-op search back in October, and Google quickly responded with a fix. However, we’re seeing a lot of Ethan’s problems here at Lijit as well. The problem is that if your desired search results would not normally fall in the top 1000 results of a normal Google search, they don’t get included in your results. For example, Brad Feld has written a ton about Microsoft in his blog at feld.com as can be seen in a typical site: Google search. However, when you use a Co-op search which includes feld.com/*, you don’t get any results fromthat domain. The problem seems to be that feld.com doesn’t make it into the top 1000 results for a normal search for ‘”microsoft”. In a similar vain, if you search me for “sex” you’ll get stuff from BoingBoing (a high PageRank site) but not my post “Attention is Meme Sex” like you might expect.

So it seems that the fixes implemented for Ethan aren’t working across the board. But I am encouraged by Google’s response to Ethan and hope that they will eventually be able to solve our issues.

Who’s your editor?

Wednesday, July 26th, 2006

John Battelle vents his frustration with Google News. Bottom line is, much like the Google search index, placement in the Google news index is becoming important; people are waking up to the fact that we have no idea who these people are who are deciding what we see and don’t see. This represents a broader shift. As Richard Edelman said:

“We have reached an important juncture, where the lack of trust in established institutions and figures of authority has motivated people to trust their peers as the best sources of information about a company.”

Will Google catch this shift? How long will people continue to trust them?

Google & Co. as the new DNS

Monday, July 24th, 2006

The domain name situation is more grim than I ever imagined. If you don’t believe me, try going to Instant Domain Search and start typing o’s. You’ll find everything up to and including ooooooooooooooooo is taken. Try any word from English, Spanish, German, Swahili, or Hindi. All taken. I even tried resorting to trying to buy an expired domain.

On Saturday I saw an TV ad that said simply “Google ‘Denver Ford’” for more information. Part of this is surely that the car dealership couldn’t get www.denverford.com (it’s for sale). But the more important point is that the search textbox is replacing the browser URL textbox.

No one types URL’s into their browser anymore. Most people don’t know how. This is why so many people ener “amazon” into Google rather than typeing “amazon.com” into the brower textbox. (I can’t find the article about Google that gave the statistic. Anyone know the one I’m thinking of?)

The main points:

  1. The search textbox is replacing the browser textbox.
  2. Domain names, especially short names, don’t matter so much and the ones for sale are certainly overvalued.
  3. Search engines are becoming the new DNS^.