Thursday, November 03, 2005
USV Sessions Feedback: Data vs Data
One of the topics we tried to tackle at USV Sessions was the value of data in 'open' architectures/open access models.
I had a point to make, which I didn't, because Tom Evslin said something kind of mind-blowing, which totally derailed me.
So here is my point. The context was that data can't be valuable, by definition, if it's totally liquid - if everybody has open APIs, or can scrape everyone else, etc.
OK. To people that study value, like economists, "data" is by definition a commodity. What's valuable is information.
Now, bear with me. I am not making the typical cog/comp sci distinction between data and information. And this is not pointless semantics.
What is information, in an economic context? Well, it's stuff that informs production or consumption decisions. That is, we can say it's "data" that affects the risk and reward that characterize different transactions (transaction costs), or "data" about people's propensity to consume: about their expectations, preferences, and utility.
This is important. The distinction here is strategic: it should let you think more clearly about the value of information in a network economy.
Information about people's expectations, preferences, and utilities, then, is valuable "data"; but information about, say, things, is probably not.
Let me make this concrete. What are tags? Well, if we know who tagged with what, tags become expectation and preference info about different people. The value of this is intuitive; it's what lies behind the upcoming tag advertising models. It also implies that you don't wanna share your tags, unless you're getting a lot back with little risk.
On the other hand, things like job postings are just "data" - not information. They may have some marginal information value - the name of the company might give a potential job-seeker info about how high transaction costs are likely to be - but the bulk is just "data".
This is why scraper models for jobs, like Indeed, create economic value: they transform "data" into information - in fact, they do so at the moment you do a search. Conversely, this is why Craigslist is getting devalued - it's more data, and less information (ie, I can personalize it less/receive less stuff that matches my preferences, expectations, etc, than elsewhere).
The fundamental point is that data is the wrong context in which to think about any kind of network business model. If you think about them in the context of information, it's very intuitive where value lies, and to what extent you should open and close gates to this information.
So here's a question for the philosophers: can a machine transform data into information by itself, or is this necessarily something only a human consiousness can do?
Your thought here seems compressed to the point of becoming elliptical.
Specifically, how does Indeed.com "transform 'data' into information... at the moment you do a search" in a way that Craigslist does not? By aggregating and filtering? By giving the user more filtering tools than Craigslist does? Or by deploying context-driven content around the listings? Not clear what your intended point is here.
It's pretty simple.
1) When you search on Indeed, you are getting back fairly specific stuff bounded by distance, keywords, etc, which matches your expectations and preferences somewhat.
2) On Craigslist, while there is a primitive search, you generally do the Craigslist thing: choose a city, a category of jobs, and then browse. This does not match your preferences and expectations nearly as tightly.
Although the amount of "data" may be almost exactly the same, there's relatively more economic information in 1 than in 2.
Conversely, we can say that the marginal value of the extra data in Indeed - the extra search terms, etc, you add is very large.
It's not about machines vs humans, it's just about if the stuff influences production or consumption decisions.
Guys, this is not a mystical topic. Just think about information as stuff that leads to different economic outcomes, where data doesn't.