Web Deep linking
Posted by admin on December 25 2007 19:10:07

Deep linking, on the World Wide Web, is making a hyperlink that points to a specific page or image on another website,
instead of that website's main or home page. Such links are called deep links.

A hyperlink either on a Web page or in the results of a search engine query to a page on a Web site other than the site’s home page.
Typically, a Web site’s home page is the top page in the site’s hierarchy, and any page other than that is considered “deep.”
For example, if a Web site linked to the Webopedia page http://www.webopedia.com/Term/D/deep_link.html, this would be considered a
deep link because the site linked to one of Webopedia’s pages other than its home page, http://www.webopedia.com.
Some in the industry have opposed the proliferation of deep links as they drive users away from a site’s home page where
there are advertisers paying for space based on page views.

The technology behind the World Wide Web, the Hypertext Transfer Protocol (HTTP), does not actually make any distinction between
"deep" links and any other links—all links are functionally equal. This is intentional; one of the designed purposes of the Web
is to allow authors to link to any published document on another site. The possibility of so-called "deep" linking is therefore
built into the Web technology of HTTP and URLs by default—while a site can attempt to restrict deep links, to do so requires
extra effort. According to the World Wide Web Consortium Technical Architecture Group, "any attempt to forbid the practice of
deep linking is based on a misunderstanding of the technology, and threatens to undermine the functioning of the Web as a whole".


Its great to see growing interest in the deep web. There are lots of reasons why basic crawling and indexing doesn't reach
the deep web without special methods. I think the most important one for your question is that spiders don't know how to
fill in a key word search box (because they don't know what key words you would like to input). Using the deep web can best
be done through real time federated search where you would use one search box that is "connected" to each deep web data base you wish to search.

Have a look at the demos on deepwebtech.com/index.php that are powered by the Explorit federated search engine.

There are a number of strategies used in this document to return a result from a deep web database to a searcher.
Generally, those try to understand the source of the information, the forms used and how they work, information associated
with each field of each form, content that may be retrieved by using the forms, how to rank the information that may be found
in response to filling out a form, and a location associated with that information if necessary.

If you are interested in how the deep web might be crawled and indexed, this patent application shows some strategies
for accomplishing those tasks. Co-inventor, Dr. Halevy, describes some of the issues involved in attempting to index
such a wide variety of information, organized in diverse ways based upon business rules that don’t anticipate indexing

Source: www.webtoolbag.com
by a search engine, in a paper he wrote called Why Your Data Won’t Mix.