‘As We May Think’ – part 10

Many ideas that Vannevar Bush envisioned came into existence in some form or other since 1945. First of all the way records are recorded changed massively with the advent of speech technology and Google glasses and other optical head-mounted displays. The second part of this essay discussed the evolution in storing information from the improved microfilm Bush suggested for his memex to CDs and DVD and other memory formats in modern day computers. However storage evolved from personal libraries and folders to servers anyone could access through their web browsers using an internet connection. The improvements regarding the consultation of records started when hypertext and hyperlinks were invented and formed the basics for the HyperText Transfer Protocol based on the client-server architecture of the World Wide Web. Even though every domain has a specific name, sometimes search engines are needed in order to find websites. Another prediction Bush made quite explicitly were new forms of encyclopedias, such as Wikipedia. In the final chapter of “As We May Think” he asks himself “must we always transform to mechanical movements in order to proceed from one electrical phenomenon to another?”[1]

Leading computer scientists and technologists agree that the definition of “computer” is changing. Ultimately scientists want to build a computer similar to the human brain. “Which means focusing on capabilities like pattern recognition and juiced-up processing power –building machines that can perceive their surroundings by using sensors, as well as distill meaning from deep oceans of data”. However, the way the human brain works remains unclear although scientists know “that talking about the brain as a computer is more than just a useful metaphor”. The founder of IBM’s Cognitive Computing group predicts “So far, we have learned to adapt to computers … but given the advent of the brain-inspired computing and how it’s going to integrate into modern computing infrastructure, computers will begin to adapt more and more to human beings.”[2]

[1] Bush. As We May Think. Chapter 8.

[2] Adrienne Lafrance. What Is a ‘Computer’ Anymore? The Atlantic. July 20 2015. Consulted 7 march 2016.

 

‘As We May Think’ – part 9

In the final chapter of the article Bush concludes that “Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them.”[1] This seems to refer to today’s Wikipedia, an encyclopedia made by and for its users. Unfortunately there exists an information gap between the semantic web and the social web. The semantic web is “built from meta-data extracted from the social web” and it “brings better search and navigability to the web”. However it is very difficult to include properties from DBpedia into the social web of Wikipedia.[2] Often different conventions for the same property of DBpedia depend on both the articles and the community.[3] One way to detect missing links and insert them according to conventions consists of the Path Index Algorithm “which indexes path queries in Wikipedia based on result sets generated by queries on DBpedia”.[4] However the need for such an algorithm shows that browsing the social web sometimes causes difficulties.

Bush already suggested a “new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record”.[5] These professionals were unavailable for Wikipedia as it “is written collaboratively by largely anonymous volunteers who write without pay”.[6] This raises many issues on the credibility of cooperatively authored information resources.[7] Some features can however help both regular users and the Wikipedia Editorial Team assess the trustworthiness of the information. Most users notice textual features, references and pictures whereas the Wikipedia Editorial Team also judges the quality based on “style, structure, images, references, stability, neutrality, length, and comprehensiveness”. The only feature that users noted, but which the Wikipedia Editorial Team did not take into account, were the internal links.[8]

Where Bush seemed to envision an Encyclopedia Britannica made by professionals with “trails” or in modern times hyperlinks running through them, one of the most popular encyclopedias is actually made by and for the public. Although the credibility issues remain, Wikipedia can offer a great starting point for any research, even if only to find keywords for further online search.

[1] Bush. As We May Think. Chapter 8.

[2] Diego Torres, Pascal Molli, Hala Skaf-Milli and Alicia Diaz. 2012. From DBpedia to Wikipedia: Filling the Gap by Discovering Wikipedia Conventions. Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology. Volume 1. IEEE Computer Society, Washington, DC, 535.

[3] Ibidem, 538.

[4] Ibidem, 539.

[5] Bush. As We May Think. Chapter 8.

[6] Wikipedia. About. https://en.wikipedia.org/wiki/Wikipedia:About. Consulted 01/03/2016.

[7] Andrea Forte and Thomas Park. How people assess cooperatively authored information resources. Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration. Article 23. ACM New York, NY. DOI: http://dl.acm.org/citation.cfm?doid=2462932.2462963.

[8] Teun Lucassen and Jan Maarten Schraagen. 2010. Trust in wikipedia: how users trust information from an unknown source. Proceedings of the 4th workshop on Information credibility.ACM, New York, NY, 23. DOI: http://dl.acm.org/citation.cfm?doid=1772938.1772944.

‘As We May Think’ – part 8

Bush distinguishes two ways of selection in his fifth chapter. First of all simple selection proceeds by “examining in turn every one of a large set of items, and by picking out those which have certain specified characteristics”.[1] This text-only selection method was once used by the first web search engines and it performs well “on relatively homogenous collections of high-quality papers”.[2] Since the web is a heterogeneous collection with various quality levels, it needed a better selection process. Bush already named a second selection method which resembles the mechanism telephones used. “It pays attention only to a class given by a first digit, then only to a subclass of this given by the second digit, and so on.” This resembles the hyperlink analysis for ranking in web search engines.

Hyperlink analysis relies on two assumptions as Monika Henzinger from Google describes it:

Assumption 1. A hyperlink from page A to page is a recommendation of B by the author of A.

Assumption 2. If page A contains a hyperlink to page then the two pages are on related topics.

This method is mainly used for ordering search results, called ranking and there are two classes of algorithms. Query-independent algorithms assign scores independent of specific queries, while query-dependent algorithms depend on a specific query.[3] The first type of algorithms measures the quality or authority of a page, but this may lead to manipulation. If “the PageRank R(A) of page A is defined as

formule
where δ is a constant usually chosen between 0.1 and 0.2, n is the number of pages in the collection, and outdegree (B) is the number of hyperlinks on page B”, the determination of authoritative pages improves significantly.[4] The second type of algorithms called query-dependent takes quality as well as relevance to the user query into account. This method relies on creating neighborhood graphs or subgraphs of the whole web graph to perform hyperlinks analyses on. However the success is dependent on the quality of pages in the neighborhood graph and topic drift occurs if the majority of pages is on a different topic. Query-dependent algorithms can be subject to manipulation by adding a few edges.[5]

Without search engines, finding what you are looking for on the exponentially growing worldwide web would become nearly impossible. However the memex Bush invented was only meant for personal use of scientist, whereas the worldwide web is accessible to anyone with a computer or smartphone and an internet connection. One thing hasn’t changed, the problems search engines face nowadays resemble what scientists faced seven decades ago.

There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers –conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.[6]

Search engines need to filter out the high-quality from the low-quality webpages in order for their users to “make real use of the record”. In his article Bush already described one of the newest technologies that improve the use of search engines. When he describes “one might, for example, speak to a microphone, in a manner described with the speech controlled typewriter, and thus make his selections” it resembles the use of the google add-on for speech recognition in combination with their search engine.[7] The difference between the trails Bush suggested and the results search engines generate, is that search engines present a list where the user still has to choose, while Bushes trails are ready-made without repeating the selection process. One solution would be the WIX system or Web Index system that generates hyperlinks to join information resources on the web. This system uses WIX Files, pairs of keywords and URL’s that are joined to the text content of a web page. It transforms keywords into hyperlinks to the matching URL by an attaching process.[8] However there is still some work to be done to automate this process further, but it looks promising.

[1] Bush. As We May Think. Chapter 5.

[2] Monika Henzinger. 2005. Hyperink analysis on the world wide web. Proceedings of the sixteenth ACM conference on Hypertext and hypermedia. ACM, New York, NY, 1-3. DOI: http://dl.acm.org/citation.cfm?doid=1083356.1083357.

[3] Ibidem, 1.

[4] Ibidem, 2.

[5] Idem.

[6] Bush. As We May Think. Chapter 1.

[7] Bush. As We May Think. Chapter 5.

[8] Yosuke Aoki, Ryosuke Koshijima and Motomichi Toyama. 2015. Automatic Determination of Hyperlink Destination in Web Index. Proceedings of the 19thInternational Database Engineering & Applications Symposium. ACM, New York, NY, 206. DOI: http://dl.acm.org/citation.cfm?doid=2790755.2790784.