Everyone's talked a lot about the post-Google web. Here's the first inklings of my take: Heritrix, which is the Internet Archive's open-source (LGPL) crawler. How cool is the Internet Archive? Very. Here's why:
As the costs of crawling and crawlers drop, and as the cost of archiving and retrieving drop, Google will
become obsolete - the Net search industry's boundaries will begin to blur (with other, more costly kinds of search, like data mining, blah, blah), and some clever contender will revolutionize it.
Personalized crawling offers different search economies than monolithic search engines: search economies which are tailored to individual utility functions. While it's still possible that Google (and the like) will provide better general
searches, because of massive economies of scale, for precision searches - those with the highest value to consumers - a personalized crawler/index is a far better answer.
Will Google be able to compete? I don't think so - their competences are all focused around the single core competence of PageRank, which is too monolithic to distribute, and it seems that organizational inertia at Google is growing every month.
In other Google news, I was really disappointed to hear that they won't be using Hambrecht's OpenIPO system (a Net auction for IPOs). Going with the typical Street players may be nice for the VC's, bankers, and Google's management, but it also sends a very strong signal about their confidence in the currently flawed IPO system (that they'd prefer the game stay opaque and essentially rigged).
Yeah, I know there are lots of other crawlers around. The point is that the industry is segregated into hugely disparate strategic groups - but costs are dropping/performance is increasing way faster in one group than the other - so this is the kind of technology curve that blindsides incumbents and shifts industry boundaries.