Unraveling Digital History / Humanities

‘As We May Think’ – part 8

Bush distinguishes two ways of selection in his fifth chapter. First of all simple selection proceeds by “examining in turn every one of a large set of items, and by picking out those which have certain specified characteristics”.[1] This text-only selection method was once used by the first web search engines and it performs well “on relatively homogenous collections of high-quality papers”.[2] Since the web is a heterogeneous collection with various quality levels, it needed a better selection process. Bush already named a second selection method which resembles the mechanism telephones used. “It pays attention only to a class given by a first digit, then only to a subclass of this given by the second digit, and so on.” This resembles the hyperlink analysis for ranking in web search engines.

Hyperlink analysis relies on two assumptions as Monika Henzinger from Google describes it:

Assumption 1. A hyperlink from page A to page B is a recommendation of B by the author of A.

Assumption 2. If page A contains a hyperlink to page B then the two pages are on related topics.

This method is mainly used for ordering search results, called ranking and there are two classes of algorithms. Query-independent algorithms assign scores independent of specific queries, while query-dependent algorithms depend on a specific query.[3] The first type of algorithms measures the quality or authority of a page, but this may lead to manipulation. If “the PageRank R(A) of page A is defined as

formule
where δ is a constant usually chosen between 0.1 and 0.2, n is the number of pages in the collection, and outdegree (B) is the number of hyperlinks on page B”, the determination of authoritative pages improves significantly.[4] The second type of algorithms called query-dependent takes quality as well as relevance to the user query into account. This method relies on creating neighborhood graphs or subgraphs of the whole web graph to perform hyperlinks analyses on. However the success is dependent on the quality of pages in the neighborhood graph and topic drift occurs if the majority of pages is on a different topic. Query-dependent algorithms can be subject to manipulation by adding a few edges.[5]

Without search engines, finding what you are looking for on the exponentially growing worldwide web would become nearly impossible. However the memex Bush invented was only meant for personal use of scientist, whereas the worldwide web is accessible to anyone with a computer or smartphone and an internet connection. One thing hasn’t changed, the problems search engines face nowadays resemble what scientists faced seven decades ago.

There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers –conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.[6]

Search engines need to filter out the high-quality from the low-quality webpages in order for their users to “make real use of the record”. In his article Bush already described one of the newest technologies that improve the use of search engines. When he describes “one might, for example, speak to a microphone, in a manner described with the speech controlled typewriter, and thus make his selections” it resembles the use of the google add-on for speech recognition in combination with their search engine.[7] The difference between the trails Bush suggested and the results search engines generate, is that search engines present a list where the user still has to choose, while Bushes trails are ready-made without repeating the selection process. One solution would be the WIX system or Web Index system that generates hyperlinks to join information resources on the web. This system uses WIX Files, pairs of keywords and URL’s that are joined to the text content of a web page. It transforms keywords into hyperlinks to the matching URL by an attaching process.[8] However there is still some work to be done to automate this process further, but it looks promising.

[1] Bush. As We May Think. Chapter 5.

[2] Monika Henzinger. 2005. Hyperink analysis on the world wide web. Proceedings of the sixteenth ACM conference on Hypertext and hypermedia. ACM, New York, NY, 1-3. DOI: http://dl.acm.org/citation.cfm?doid=1083356.1083357.

[3] Ibidem, 1.

[4] Ibidem, 2.

[5] Idem.

[6] Bush. As We May Think. Chapter 1.

[7] Bush. As We May Think. Chapter 5.

[8] Yosuke Aoki, Ryosuke Koshijima and Motomichi Toyama. 2015. Automatic Determination of Hyperlink Destination in Web Index. Proceedings of the 19^thInternational Database Engineering & Applications Symposium. ACM, New York, NY, 206. DOI: http://dl.acm.org/citation.cfm?doid=2790755.2790784.

Experimenting with the Phonograph

During our last DTU skills training we experimented with the phonograph as a form of media ethnography. Simulating the learning-by-doing teaching technique that Kirsten Haring introduced we didn’t waste much time reflecting and got hands-on immediately during the second day of the training. The one thing we did reflect on the day before was what we wanted to record. Both Marleen and I were keen on combining our hobbies and work, so we decided on recording live music. I was particularly motivated to try out the violin because Dr. Stefan Krebs mentioned that at the time they couldn’t record the violin with the Edison phonograph because it wasn’t loud enough. Ever the historian I wanted to put that finding to the test (perhaps driven to prove it wrong).

When I got home that evening I got my dusty violin out of its case and started looking through my stack of sheet music for a piece that was easy to play and did not exceed the 2-minute wax cylinder limit. Eventually I decided on the Carnival of Venice because it is such a recognizable tune. I practiced for about an hour the night before the second training day and packed my violin and the sheet music before heading to the skills training. Although we were given a copy of the original user manual the night before, I only took a brief look at it. I generally don’t bother with user manuals too long unless I have to put together IKEA furniture, so this manual didn’t help me much either. That morning the first thing we did was take a long hard look at the Edison Phonograph and figure out what each component did. In order to understand the recording process, we watched a YouTube instruction video and after a presentation on video-reflexive ethnography by professor Jessica Mesman attempted our first recording.

phonograph

Instead of sticking to the original schedule and having one group observing the other, we all gathered around for the recording. First I had to tune the violin while Marleen was also warming up the flute she brought. Jessica held the sheet music for me, Kaarel announced the title of the music, Stefan started the recording and measured the decibels with his phone, and the others were either recording the first try with their phone or observing the procedure. At the end of the first recording, I stopped the phonograph on time so that there was a part of the cylinder left for Marleen and then she recorded the song she knew by heart.

While listening to the recording afterwards the violin was indeed hard to hear, whereas the sound of the flute was slightly easier to pick up, so we decided to try out a few different techniques. First, we added a piece of carton board around the recording horn in order to capture the sound better. That didn’t work because the carton board absorbed the sound rather than expanding the reach of the horn. Furthermore, the piece of carton made it harder to stand close enough to the recording horn. Next, we heated the wax cylinder right before our final test and that improved the recording much better than the cardboard addition.

In order to listen to the recordings we had to change the horn and the ‘reproducer’, and both of these elements influenced the quality of playback. As I was visiting the Heinz-Nixdorf museumsForum a month later in Paderborn, the Dictaphone caught my attention. This device that otherwise looks very similar to the Phonograph, seemed to use headphones instead of a reproducer-horn.

The Phonograph was used at home, whereas the Dictaphone was used in the office. While I was looking at the images accompanying the display, two thoughts popped into my head. First, I realised that this constellation of a manager speaking into the Dictaphone and a secretary afterwards transcribing the recording must have inspired Vannevar Bush while he was describing certain features of the ‘memex’ in As We May Think. Second, transcribing speech to text – whether by typewriter or modern day computer – requires absolute concentration and noise-cancelling headphones. The headphones from this Dictaphone must have been inspired by the stethoscope because the metal part that you would expect to go over the head of the secretary hung below her chin. Whether for entertaining or professional purposes, this analogue media has influenced our experience of listening to music profoundly. To the point where modern songs usually only last between 2 and 4 minutes, the maximum length of a recording on a wax cylinder.

‘As We May Think’ – part 7

According to David E. Millard and Martin Ross the World Wide Web has evolved from one of the hypertext systems to a new Web 2.0. They enumerate the aspirations of the hypertext pioneers that came into existence in this new form of the Web. First of all hypertext should facilitate three forms of search: content search, context search by using meta-data and finally structural search looking for particular patterns. Other aspirations of the Web 2.0 related to structure and content include n-ary links, composition that resembles transclusions in Xanadu and extended navigational structures for advanced browsing as well as trails that enable recommendations based on user history. Where previously hypertext dealt with static content or structure, the Web 2.0 strives towards dynamic content and structures as well as computation over the network and personalization. Keeping track of versions has always been important in hypertext and information systems, consisting of two different types: entity versioning and network versioning. One of the final features of hypertext is the “lack of distinction between authors and readers” which facilitates collaboration. The authors sum up five pursuits on open authoring, namely private and public annotation, global and restricted collaboration and the extensibility of the system.[1] Vannevar Bush already described the basics of hypertext and its use on the World Wide Web as follows:

The process of tying two items together is the important thing. When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard.[2]

“A hypertext document with its corresponding text and hyperlinks is written in HyperText Markup Language (HTML) and is assigned an online address called a Uniform Resource Locator (URL).” This online address consists of three parts, namely the protocol, domain name or IP address and optionally the pathname to the file.[3] Tim Berners-Lee proposed the HyperText Transfer Protocolwhere web browsers send requests to web servers according to the client-server architecturealready discussed in the previous chapter of this essay.[4] The second part of the URL is the domain name, which follows a “three-level “server.organization.type” format”. The organization type usually forms the top level and is indicated by .com for commercial sites or .edu for educational sites and many other abbreviations. The name of the organization itself precedes the type abbreviation and finally the third level such as “www” indicates a specific host server.[5] Since the URL rarely contains the numeric internet address, the Domain name system or DNS converts the World Wide Web name address to a numeric internet address and vice versa. DNS servers house a database containing these translations of addresses that structure domain names hierarchically according to the top level domainor TLD. The Internet Corporation for Assigned Numbers and Names or ICANN register and control these names.[6]

[1] David E. Millard and Martin Ross. 2006. Web 2.0: hypertext by any other name? Proceedings of the seventeenth conference on Hypertext and hypermedia. ACM, New York, NY, 27-30. DOI: http://dl.acm.org.kuleuven.ezproxy.kuleuven.be/citation.cfm?doid=1149941.1149947.

[2] Bush. As We May Think. Chapter 7.

[3] 2016. URL. Encyclopaedia Britannica. Britannica Academic. Consulted 6 march 2016. http://academic.eb.com.kuleuven.ezproxy.kuleuven.be/EBchecked/topic/614591/URL.

[4] 2016. HTTP. Encyclopaedia Britannica. Britannica Academic. Consulted 6 march 2016. http://academic.eb.com.kuleuven.ezproxy.kuleuven.be/EBchecked/topic/279732/HTTP.

[5] 2016. Domain name. Encyclopaedia Britannica. Britannica Academic.Consulted 6 march 2016. http://academic.eb.com.kuleuven.ezproxy.kuleuven.be/EBchecked/topic/923076/domain-name.

[6] 2016. DNS. Encyclopaedia Britannica. Britannica Academic. Consulted 6 march 2016. http://academic.eb.com.kuleuven.ezproxy.kuleuven.be/EBchecked/topic/1473555/DNS.