Bush distinguishes two ways of selection in his fifth chapter. First of all simple selection proceeds by “examining in turn every one of a large set of items, and by picking out those which have certain specified characteristics”.[1] This text-only selection method was once used by the first web search engines and it performs well “on relatively homogenous collections of high-quality papers”.[2] Since the web is a heterogeneous collection with various quality levels, it needed a better selection process. Bush already named a second selection method which resembles the mechanism telephones used. “It pays attention only to a class given by a first digit, then only to a subclass of this given by the second digit, and so on.” This resembles the hyperlink analysis for ranking in web search engines.
Hyperlink analysis relies on two assumptions as Monika Henzinger from Google describes it:
Assumption 1. A hyperlink from page A to page B is a recommendation of B by the author of A.
Assumption 2. If page A contains a hyperlink to page B then the two pages are on related topics.
This method is mainly used for ordering search results, called ranking and there are two classes of algorithms. Query-independent algorithms assign scores independent of specific queries, while query-dependent algorithms depend on a specific query.[3] The first type of algorithms measures the quality or authority of a page, but this may lead to manipulation. If “the PageRank R(A) of page A is defined as
where δ is a constant usually chosen between 0.1 and 0.2, n is the number of pages in the collection, and outdegree (B) is the number of hyperlinks on page B”, the determination of authoritative pages improves significantly.[4] The second type of algorithms called query-dependent takes quality as well as relevance to the user query into account. This method relies on creating neighborhood graphs or subgraphs of the whole web graph to perform hyperlinks analyses on. However the success is dependent on the quality of pages in the neighborhood graph and topic drift occurs if the majority of pages is on a different topic. Query-dependent algorithms can be subject to manipulation by adding a few edges.[5]
Without search engines, finding what you are looking for on the exponentially growing worldwide web would become nearly impossible. However the memex Bush invented was only meant for personal use of scientist, whereas the worldwide web is accessible to anyone with a computer or smartphone and an internet connection. One thing hasn’t changed, the problems search engines face nowadays resemble what scientists faced seven decades ago.
There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers –conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.[6]
Search engines need to filter out the high-quality from the low-quality webpages in order for their users to “make real use of the record”. In his article Bush already described one of the newest technologies that improve the use of search engines. When he describes “one might, for example, speak to a microphone, in a manner described with the speech controlled typewriter, and thus make his selections” it resembles the use of the google add-on for speech recognition in combination with their search engine.[7] The difference between the trails Bush suggested and the results search engines generate, is that search engines present a list where the user still has to choose, while Bushes trails are ready-made without repeating the selection process. One solution would be the WIX system or Web Index system that generates hyperlinks to join information resources on the web. This system uses WIX Files, pairs of keywords and URL’s that are joined to the text content of a web page. It transforms keywords into hyperlinks to the matching URL by an attaching process.[8] However there is still some work to be done to automate this process further, but it looks promising.
[1] Bush. As We May Think. Chapter 5.
[2] Monika Henzinger. 2005. Hyperink analysis on the world wide web. Proceedings of the sixteenth ACM conference on Hypertext and hypermedia. ACM, New York, NY, 1-3. DOI: http://dl.acm.org/citation.cfm?doid=1083356.1083357.
[3] Ibidem, 1.
[4] Ibidem, 2.
[5] Idem.
[6] Bush. As We May Think. Chapter 1.
[7] Bush. As We May Think. Chapter 5.
[8] Yosuke Aoki, Ryosuke Koshijima and Motomichi Toyama. 2015. Automatic Determination of Hyperlink Destination in Web Index. Proceedings of the 19thInternational Database Engineering & Applications Symposium. ACM, New York, NY, 206. DOI: http://dl.acm.org/citation.cfm?doid=2790755.2790784.