Archive for the ‘opendata’ Category

Is Comcast selling your clickstream? Audio & Transcript.

Monday, March 19th, 2007

The revelation that ISP’s like Comcast are selling your clickstream data has got a little attention. My own notes were spotty, so it was great to see that David Henderson audio recordings of the entire conference, including the infamous talk with David Cancel of Compete.com.

If people were so outraged (and confused!) over AOL releasing users’ search information, I wonder when the furor will turn towards the ISPs who actively selling even more private information. (There’s no way to be angry with Compete, as they are simply buying what is available and doing valuable statistical analysis.)

Here are the juicy bits from that recording, referenced by time.

0:00
Seth: David, how big is the panel?
David: The panel is a little over 2 million people right now.
Seth: You’re watching the behavior of 2 million people?
David: 2 million people, correct.

2:40
Roger ???: I was curious; how much do ISP’s think their users’ data is worth?
David: I think it depends on the ISP. So, some folks think it’s worth a lot. If you’re getting a couple million folks, you know, in the millions of dollars…to license that, per year.
Seth: So the average ISP customer … So if I use Comcast … I’m worth 40 cents a month?
David: Uh-huh. Something like that. It depends on the ISP, some will charge a little bit more, closer to a dollar.
Seth: So for 40 cents, Comcast is selling what of mine? They are selling…my entire clickstream?
David: Your entire clickstream. So, and some user [???] that identifies you. So 123 is your user ID per the time that you’re a Comcast customer, and your entire clickstream.
Audience member: So that’s essentially the same [as the] AOL data, that there was a lot of furor over.
David: It’s beyond that.
Audience member: It’s way beyond that.

5:26
Audience member: How many people are they selling it too? [muffled]
David: Lots. 10, 12 folks are buying this kind of data that I know of.  So and a million dollars plus from each of those. Starts to add up.  It’s pure profit.

5:52
Seth: Is there any clearing house of data of how many times my Comcast clickstream is getting sold?
David: No.
Seth: You see, I thought it was bad where in the lead generation world where your mortgage lead gets sold 15 times. This is even worse, because you’re not filling out a form.
Esther: But they don’t bother you as much!

7:30
Audience memeber: Do you know if any government agencies are looking at this data? [muffled]
David: I don’t know. I mean, we’ve been contacted a long time ago…[crowd noise]
Esther: They are.
David: …similar experiments.
David: I’m sure they’re buying it, or have access to it as well.
Seth: They’re the ISP to the ISPs.

9:20
David: When I see some of the information that our client have on users, it seems a lot more scary to me than what you can gain from clickstream information.  It might just be me, I might be numb to the clickstream information.  But some of the credit card information that we know some people are capturing is a lot more scary…than some of the exhaust on the clickstream side.

10:30
David: We get clickstream information, basically like the history looks like in your browser.  We don’t get the underlying information like what kind of videos have been played or rich elements. Although…it is available.

Memories of OpenData 2007

Thursday, March 15th, 2007

Abdur Chowdhury, Summize.com (formerly at AOL, made decision to release the AOL search data.)

  • 3 Questions you must ask yourself before opening your data:
    • Why are you opening the data?
      • Right answer: You firmly believe that you are helping people/consumers.
    • What are you going to do once you open the data?
    • Are you ready for the unexpected consequences?

Gerry Campell
  • This is sort of a “coming out party” for Reuters.
  • We (Reuters) have content and he have to connect with our consumers. They have trouble finding it.
  • Reuters is watching what’s happening out there, looking for how it can serve vertical markets: finance, technology, you name it.

Esther Dyson
  • You’re 85 years old and on your deathbed.  You have 50 million dollars and you have 10,000 friends on Friendster 8.0 … Which is weirder?! You can’t spend all the money and you can’t enjoy that many friendships.

Chris Law, Aggregate Knowledge
  • Your social network is a poor proxy for what you’re interested in.
  • Your behavior is a good proxy for what you’re interested in.

Sanjiv Das, Morgan Stanley
  • Information is so important to us, and gives us so much proprietary advantage. “Open data” is scary to us.
  • Data is going to be a commodity. get ready for it. The ORGANIZATION of that data may NOT be a commodity.  that’s interesting.

David Cancel, Compete
  • 2 million people being monitored
  • 250-300K have the toolbar installed
  • ISP’s are monitoring and licensing data to compete

Seth: This is granted deep in the EULA of the ISP?
David: Yes, just like its deep in the EULA of a credit card company.

Seth: How much do you pay an ISP?
David: For an ISP with millions of users, a million or so a month. [year?]

Seth: If I’m a comcast user, I’m worth about $.40/month for my entire clickstream
David: Yes. 10-12 folks buying this data, that I know of. (So you’re worth more than $.40!)

Seth: What percentage of us here are having our clickstream sold, would you guess?
David: 10%

Seth: Is the government buying this too?
David: Yes, I’m pretty sure they are.


Dick Costolo, Feedburner
  • Opening an API can have unplanned good consequences: Overnight we had a ton of new users from Spain. Someone there had used the Feedburner subscription count (obtained via the API) as part of a reputation/ranking algorithm, so now all the blogs were signing up to raise their stats. Now we probably won’t see a competitor come out of Spain.

Scott Rafer - (formerly of) MyBlogLog
  • People got into blogging to make new human connections, and somehow some part of our forebrain mistakes these little pixel collections for human connection.

Seth Goldstein
  • Imagine there is information about “Who is influential” … Who does that info belong to? To the people who are influential? To the people who calculated it?
  • Alignment — if you pay attention enough, you start to align with someone. I hate how much I am influenced by Fred Wilson, but I pay attention to his stuff, so I am.

Random Quotes and Exchanges

“The best guarantee for attention is living your life as openly as possible, expressing yourself as publicly as possible as early as possible.” - Goldhaber

??? - The Genie [of data collection] is out of the bottle, now its time to ask for the 3 wishes. We’ve gotta think carefully about what those 3 wishes should be.

Chris Law (Aggregate Knowledge): I wish AttentionTrust compliance was widespread…we don’t want to surprise people.
Steve Gilmore: This is bullshit.  Have you signed/endorsed the AttentionTrust principles?
Chris Law: No, we’re looking into it.


Open Data - First Night

Monday, March 12th, 2007

It’s a warmish night in Manahattan after the opening (delicious) dinner of the Open Data 2007 Conference. It’s a spiffy afair. Those Reuters people have such a nice place.

And in the spirit of open data, here is a picture of Seth which I openly grabbed from David Henderson’s Flickr stream. (David and I will work out a royal payment schedule tomorrow at breakfast…)

All Opendata photos on Flickr

The evening kicked off with a welcome from Seth Goldstein who then turned it over to Tom Glocer, CEO of Reuters. Not the company I would have expected to be interested in “Open”, but he gave a great overview of how Reuters has a vested interest in this issue and are not de-facto against open data.

But the meat of the evening was the following discussion. I can’t remember who said what, so these are just hilites of the topics that were mentioned. (And that I found interesting…)

  • A New York Times article about how young people don’t care anymore about who gets “credit” for content produced.
  • Exposing the infrastructer of systems, the inner data-workings, is a good thing. Like postmodern principles, like in architecture where you reveal all the pipes and plumbing and wires.
  • Privacy is (or may be?) a new concept in human history. For most of our exisitence we’ve been in small communities where everyone knows everyone.
  • We may be experiencing a demographic shift, where the younger generation doesn’t care about privacy so much.
  • Tom Glocer is looking forward to the time in 10 or so years where the first senator or supreme court justice has to answer questions about why their MySpace profile in 2007 listed their interests as “Arctic Monkeys and smoking reefers”.
  • The human side of exposing so much information, and users growing comfortable with that. [See Love in the information age]
  • Time lag is important. Often data is closed at first, but then made open. [This is key. I recall a killer article by David Brin in the 90's about encryption, and how in the future the only data you will be able to charge for is time-sensitive data.] [David Cancel of Compete.com mentioned to be before dinner about a company that does trading on data so real-time that the speed of light is important in the design of their system. Now that's time-sensitive!!]
  • Do we (the consumers, the data publishers) need something like the GPL where we licence our data and how it can be used? Often our “open data” is crawled, processed, and then sold for a lot of money.
  • An individual’s data is worth maybe –what?– 20 cents? At what scale will the users demand payment?
  • Will the users form a union to demand payment? [See my post on the subject]
  • Social connection data is very valuable, but some of the most closed off data. (MySpace, Facebook, LinkedIn, etc… guard this data fiercely.)
  • Club Penguin is basically a kid’s introduction to social networks.

My thoughts which I didn’t [find the right place / get up the nerve] to add at the time:

  • At what scale does data become own-able? Can I demand payment if someone uses a paragraph of mine, a sentence? A 16×16 pixel square of one of my photos?
  • People are some of the most unpredictable entities around. All of this data helps us to understand them. To make them predictable. Almost all the data we’re talking about is people-data…we’re talking about geologic or weather data!

Conference starts tomorrow at 7:30am … Tim W. Time for bed!

Open Data

Monday, March 12th, 2007

I’m in New York for the OpenData conference. This is going to be great!

Open Call for Participation in March 13 Open Data Conference in NYC

It is so easy to get excited about the latest Web 2.0 online media applications that we often lose sight of the fact that underneath all of these innovations is a fundamentally different kind of operating system, one based on open data as opposed to closed proprietary content. If I had to sum it up in a sentence:Open Data is to media what Open Source is to technology. On Tuesday March 13, more than sixty…