The following is excerpted from a December 30, 2009 blog post by Erik Sherman published by BNET:
One of the limitations of search engines such as Google (GOOG) or Microsoft’s (MSFT) Bing that crawl the web looking for links is that they can only tell you about what they’ve finally come upon. That still leaves vast amounts of material as yet to be “discovered.” A patent application from Cisco (CSCO) suggests a clever way to help update the engines...
In June 2008, Cisco filed a patent application yet to be granted but published by the US Patent and Trademark Office on December 17, 2009. The title is Seeding search engine crawlers using intercepted network traffic. As the application notes, a web-crawling search engine has a basic limitation: it cannot index sites of which it doesn’t yet know. Furthermore, it may never be able to reach pages that have not been introduced, either by direct input or by being connected to its existing structure of web pages, known as a web-graph.
And yet, people still use these pages. Cisco’s claimed invention is to have network equipment such as “routers, multilayer switches or any other suitable device” examine data packets for HTTP requests that appear when a network user is looking to reach a resource on the web. The devices would strip out the URLs and pass them to the search engine, which would now know about the page and be able to add the new-to-it site to the web-graph.
Read the full post here.