2 Jul 2010 kgb   » (Master)

Real-Time Search

The web is moving rapidly toward real-time. Real-time display of messages, real-time display of posts and comments, real-time status updates from friends, disposable chats, even real-time video.

The future and strength of the Internet is in providing and spreading information instantly. Some institutions will push to capitalize on delaying information (for example stock quotes, law enforcement, and government), but in the long term this will have limited success. When every person is armed with a pocket computer and instant access to the world, you can’t hide much.

What’s not quite there yet is real-time search, because that has a dependency on the real-time notification of content publishing, which too is lacking.

Traditional search engines crawl the web, meaning they start at some web page and record all the links found to other locations, along with a representative content of the page. As the links are followed more links are encountered, and so on. The end result is that you have a massive database that can be used to return search results. Not all content is easily captured today, video files and the imagery within photographs for example, but as technology progresses these obstacles will be overcome.

In real-time search, content is cataloged as it is published. Search results would include this information immediately, and automatically update your display with the changes. Real-time search would probably be used concurrently with traditional crawling, but to do it at all means search engines need to know WHEN something has changed instantly. The blogging “ping” system is a working example of a commonly used publish notification system. Services like Google Alerts and RSS feeds also publish data as quickly as their source systems want them to. To do this effectively however the notification needs to include details of the data being published or changed, not just a ping or a link. If everyone adopted a standard of real-time notification the dependency for crawling goes away, but the practice would still occur because of desires to capture “the deep web” and any data being excluded (accidentally or intentionally) from the notification network.

Currently real-time search is specialized. Twitter, Facebook, and FriendFeed all have private search options for their data. Tweetmeme is an example of an external service providing real-time search and trending results on Twitter posts, but only for a short period of a week.

The Christian Science Monitor posted an article on upcoming real-time search engines, discussing CrowdEye, Collecta, and Google. I encourage you to read it. Google is expected to come out with their own micro-blogging service in the future, and already having the most popular search engine means if they can sway people away from Twitter they could be the leader in this space too. Microsoft has yet to provide a clue on how they will respond, but if Bing becomes popular it would be an obvious tool for them to add onto.

[Updated 7/6/2009 to include mention of the blogging ping network]

Syndicated 2009-07-06 06:45:51 from Keith Barrett Online » Technology

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!