‘As We May Think’ – part 3

Besides creating audio records instead of writing information down, an image says more than a thousand words. Vannevar Bush already envisioned an entirely different experience for researchers.

One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two together.[1]

Although still images already provide a lot of information together with their tags, commenting while photographing might be easier using a video camera. Most devices such as laptops, tablets and mobile phones already contain cameras, but a Go Pro or other hands free camera increases the ease of use. However video production consists of planning during pre-production, capturing during production and processing in the post-production phase.[2] Bush was already aware of these phases.

As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may both be in miniature, so that he projects them for examination.[3]

The process of recording information is constantly improving, especially after the invention of an optical head-mounted display such as Google Glasses. It displays information and communicates with the internet using natural language voice commands. Furthermore the glasses include a touchpad and a camera which can take photos or record videos just by voicing the command “record a video”.[4] This technology can improve the research process since it “1) provides workflow guidance to the user, 2) supports hands-free operation, 3) allows the users to focus on their work, and 4) enables an efficient way for collaborating with a remote expert”.[5] Although the glasses were already tested in the healthcare and industry maintenance fields, the Glass Explorer program ceased to exist. Ivy Ross and Tony Fadell took over, but the release date of their new product remains unknown.[6]

[1] Bush. “As We May Think,” Chapter 3.

[2] Dustin Freeman, Stephanie Santosa, Fanny Chevalier, Ravin Balakrishnan and Karan Sing, “LACES: live authoring through compositing and editing of streaming video,” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2014, 1207-1216, doi: 2556288.2557304.

[3] Bush, “As We May Think,” Chapter 3.

[4] “Google Glass,” Wikipedia, accessed March 7, 2016. https://en.wikipedia.org/wiki/Google_Glass.

[5] Xianjun Sam Zheng, Patrik Matos da Silva, Cedric Foucault Siddharth Dasari, Meng Yuan, Stuart Goose. “Wearable Solution for Industrial Maintenance,” Proceedings of the 33rd Annual ACM Conference Extended Abstract on Human Factors in Computing Systems, 2015, 311-314, doi: 702613.2725442.

[6] Nick Bilton. “Why Google Glass Broke,” The New York Times, accessed March 7, 2016, http://www.nytimes.com/2015/02/05/style/why-google-glass-broke.html?smid=nytcore-iphone-share&smprod=nytcore-iphone&_r=1.

Advertisements

The ‘neutrality’ of search engines

The past two weeks I had a “presentation skills training” where we had to choose a topic that was completely new to us. Because I was quite interested in looking at a tool most of us use every day, I decided to take a closer look at the Anatomy and Workings of Search Engines. Since one of those search engines has become a common verb in most languages, let’s start from the Oxford English Dictionary’s definition of “to google”:

Pronunciation:Brit. /ˈɡuːɡl/, U.S. /ˈɡuɡ(ə)l/

1. intr. To use the Google search engine to find information on the Internet.

2. trans. To enter (a search term) into the Google search engine to find information on the Internet; to search for information about (a person or thing) in this way.

Since people use Google and other search engines regularly, they tend to rely on four common assumptions as identified by Bettina Fabos in 2003:

  1. Search engines are impartial information tools.
  2. Search engines search the entire Web, gleaning the most relevant results.
  3. Search engines vary greatly, thus offering choice and a competitive marketplace.
  4. Search engines are the only place to go for relevant information on the web.

Before I can contradict these four assumptions, it is important to understand the structure of the search engine industry.

 

Search Industry Structure

LESSON 1: Search Engines are not impartial, they are part of an industry.
Screen Shot 2017-06-22 at 10.21.04
Search Industry structure, based on the article by Bettina Fabos in 2003, using the Eleanor template from slides carnival.com
Directories are quite simply databases that contain information. In this case they contain web pages in indexed lists to feed the search engine providers. Search engine providers may use existing directories, or build their own directories by crawling the web. In order to understand the workings of search engine providers, a good starting point is the famous paper dating back to 1998 by Sergey Brin and Lawrence Page, PhD students at Stanford University and founders of Google. You can understand crawling as spiders that are sent out over the web to the location that the URL server gave them in order to gather the information in a web repository.  After collecting web pages, the indexer will convert web pages (also called documents) into a list of word occurrences or hits and add sufficient metadata which is stored in barrels. The sorter than needs to invert this index by converting the list of words attached to each document into a list of documents for each word.
So imagine looking for Katy Perry (famous example in this video). Simply put, the search engine provider will compare a list of documents containing the word ‘Katy’ to a list containing the word ‘Perry’ and feed back only those documents containing both words, preferably close to each other. However, you don’t want just any document that contains Katy Perry, you need relevant web pages. This is where the ranking algorithm comes into place, especially Google’s PageRank algorithm. The algorithm looks at the ‘popularity’ of a web page based on how many other (relevant) web pages refer to it.
Finally, the search engine portal is any website containing a search bar, which means even this page could be understood as a search engine portal. The importance of portals lies in their usability and their user friendliness.

 

Sponsorship

LESSON 2: Search engines do not search the entire web, especially not the deep web. Some directories only include paid-for content.
Screen Shot 2017-06-22 at 13.32.26
Models of sponsorship, based on the article by Bettina Fabos in 2003, using the Eleanor template from slides carnival.com

 

Google’s initial strategy was selling their technology as a search engine provider not only to other search engines, but also to other websites that use Google’s technology to power search on their own website. This strategy brings in some money, but it’s not a continuous flow of cash. Therefor, some other search engines chose to focus more on marketing agencies trading for space thus including sponsored links. Within the search industry structure, it is clear that the best way to appear in search results over different search engines, is to pay in order to add (commercial) content to the repository. However, in order to determine the efficiency of advertisements, search engines and marketeers have agreed to a pay-by-performance strategy, which means advertisers only pay the search engine when someone actually clicks on a link. Other search engines have gone for more aggressive strategies such as paid inclusion, where the advertisement appears in every search, but had to sacrifice users because of it.

 

‘Googlearchy’

LESSON 3: There is little competition in the marketplace, with mainly American companies dominating the search industry.

The search engine industry used to be dominated by three main companies: Google, Yahoo!, and Microsoft. Add to that the current trends in voice search and as this infographic beautifully demonstrates. Amazon might not be the richest company, but they certainly have the first-mover advantage and are quickly connecting their Alexa voice assistant to cars and home devices, effectively taking over parts of our lives.

 

Screen Shot 2017-06-22 at 13.55.09
Infographic showing companies leading the space race in voice search (search engine watch).

This beautiful infographic also shows the challenges related to monitising voice search. Noble as the goals of Google may be in its promises of search engine integrity, the main thing that counts at the end of the day, is money. The reason Amazon holds such a large segment of the market, could be due to the connection with their online store possibilities and connections to apps and services such as music streaming services.

Conclusion

LESSON 4: There are other places to go for information other than search engines. - perhaps the library? -

Independent institutions such as the university, but also public libraries should warrant access to knowledge that does not necessarily have a commercial value. Therefore, these public institutions need to counteract the commercial interests of search engines by focussing on Open Access. In this regard, the Digital Humanities field is one of the first and most important advocates for Open Access. It is this years main theme of the Alliance of Digital Humanities conference in Montréal Another important yet political and legal institution working on Open Access is the Open Access Infrastructure for Research in Europe or OpenAIRE. Preferably Open Access information is accessible through Google, but we also need to think about back-ups and other ways of disseminating information.

‘As We May Think’ – part 2

In the first few chapters of “As we may think” Bush discusses the difficulties on how to create a record. He already envisioned solutions to shorten the process of creating and spreading research.

To make the record, we now push a pencil or tap a typewriter. Then comes the process of digestion and correction, followed by an intricate process of typesetting, printing, and distribution. To consider the first stage of the procedure, will the author of the future cease writing by hand or typewriter and talk directly to the record?[1]

After the invention of the Voder which emitted recognizable speech when typed to, the Vocoder did exactly the opposite. “Speak to it, and the corresponding keys move.” Another technology already in existence at the time Bush wrote his influential article, was the stenotype “which records in a phonetically simplified language”. “Combine these two elements, let the Vocoder run the stenotype, and the result is a machine which types when talked to.”[2] The technology that enables devices to respond to spoken commands called speech recognition, exists in both mobile phones and tablets, as well as laptops. Most common uses include dictation, search and giving commands to computers. Another example of speech recognition is Apple’s Siri, a personal assistant on their smartphones.

In the Humanities, and Digital Humanities especially, researchers have shown an interest in non-text materials because of the “massive increase in the quantity and availability of audiovisual (AV) materials and a rapid development in technology for handling such materials”.[3] Speech recognition is mostly linked to transcribing audio-visual materials, especially speech-to-text transcription.

Picture1
Schematic model of humanities research with AV materials

Since speech recognition systems are so domain specific, they cannot handle every domain and “even speech from the same domain that differs from the ‘training’ data may be problematic”.[4]

Speech-to-text transcriptions have historically comprised an unpunctuated and unformatted stream of text. There has been considerable recent research into generating ‘richer’ transcriptions annotated with a variety of information that can be extracted from the audio signal and/or an imperfect transcription. […] Investigations have often used speech from only a small set of domains, such as broadcast news and conversational speech. Emotion-related work in particular is very preliminary.[5]

Strategies on how to expand the role of audiovisual media in Digital Humanities was discussed during the Digital Humanities Conference 2014 in Lausanne, at a workshop by researchers involved in AXES.[6] Research fields which might benefit from speech recognition include: film, television, radio, and oral history.

[1] Vannevar Bush, “As We May Think,” Atlantic Monthly, July 1945, http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/, chapter 3.

[2] Bush, “As Wy May Think,” chapter 3.

[3] Alan Marsden, Adrian Mackenzie and Adam Lindsay, “Tools for Searching, Annotation and Analysis of Speech, Music, Film and Video – A Survey,” Literary and Linguistic Computing, no. 22,4 (2007): 469-488, accessed January 9, 2017, doi: 10.1093/llc/fqm021.

[4] Marsden et. Al., “Tools for Searching”.

[5] Marsden et. Al., “Tools for Searching”.

[6] “AV in DH Special Interest Group,” Access to Audiovisual Archives, February 12, 2015, http://www.axes-project.eu/?p=2419.

‘As We May Think’ – part 1

When Vannevar Bush wrote As we may think in 1945 he could never have imagined the technologies existing today.[1] A few of his ideas became reality in some or other form but where he “urges that men of science should then turn to the massive task of making more accessible our bewildering store of knowledge”, his ideas actually affected a much larger population.[2] He came across one particular problem, namely that “publication has been extended far beyond our present ability to make real use of the record.”[3] Even in 1945 machines were relatively cheap and dependable, so according to Bush they would provide a solution. In his opinion a record could only be useful to science when it is continuously extended, stored, but “above all it must be consulted”.[4] First he discusses multiple solutions for recording knowledge, such as a combination of a vocoder and a stenotype to create a “machine which types when talked to”. He also imagined advanced arithmetical machines that perform 100 times present speeds or more.[5] Bush even describes what we would call data mining or machine learning.

In fact, every time one combines and records facts in accordance with established logical processes, the creative aspect of thinking is concerned only with the selection of data and the process to be employed and the manipulation thereafter is repetitive in nature and hence a fit matter to be relegated to the machine.[6]

Furthermore, he explains the difference between simple selection that examines every item and the selection mechanism of a telephone that narrows down its selection by classes and subclasses, represented by each digit. However both these selection methods use indexing, while the human mind “operates by association”.[7] He figures his memex will provide the solution, for it creates trails that tie multiple items together.[8]

Screen Shot 2017-05-08 at 13.57.57
THE MEMEX

Finally Bush concludes that not only new forms of encyclopedias will appear, with trails running through them, he also asks himself this: “Must we always transform to mechanical movements in order to proceed from one electrical phenomenon to another?” Imagine that instead of typing this document, a machine would intercept the electrical impulses and type, without the interference of the mechanical movement of my hands on the keyboard.[9]

[1] Vannevar Bush, “As We May Think,” Atlantic Monthly, July 1945, http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/.
[2] Bush, “As We May Think”, Introduction.
[3] Bush, “As We May Think”, Chapter 1.
[4] Bush, “As We May Think”, Chapter 2.
[5] Bush, “As We May Think”, Chapter 3.
[6] Bush, “As We May Think”, Chapter 4.
[7] Bush, “As We May Think”, Chapter 5-6.
[8] Bush, “As We May Think”, Chapter 7.
[9] Bush, “As We May Think”, Chapter 8.

Hidden Figures: Black Women in the History of Computing

After the first full week of “being a PhD student” Eva and I went to the screening of Hidden Figures in Kirchberg. Because this film is so closely related to my research, I decided to write my first official blogpost about these extra-ordinary women. Now for the first question: what is the film about?

HIDDEN FIGURES is the incredible untold story of Katherine Johnson (Taraji P. Henson), Dorothy Vaughan (Octavia Spencer) and Mary Jackson (Janelle Monáe)—brilliant African-American women working at NASA, who served as the brains behind one of the greatest operations in history: the launch of astronaut John Glenn into orbit, a stunning achievement that restored the nation’s confidence, turned around the Space Race, and galvanized the world. The visionary trio crossed all gender and race lines to inspire generations to dream big.[1]

In order to understand underlying and sometimes clear tensions, four main themes arise: segregation and race, gender and class, American society, and finally the IT culture. Although these themes seem clearly defined, in reality and in the film they often create complex narratives and scenes with hidden messages. Therefore, it does not make sense to discuss each theme separately, but to embrace their intersection.

The exception: education past the eighth grade

In the opening scene we see a meeting of Katherine’s parents with the principal of her school and her teacher, urging them to accept a scholarship and some money for the trip to send her to the West Virginia Collegiate Institute. In a reaction to an insult, Katherine proudly responds, “I was the first Negro female student at West Virginia University Graduate School.” [2] Starting around the turn of the century, “growing numbers of Black women had the opportunity to enter college and the professions,” but “the masses of Black women were still relegated to domestic and menial work.”[3] By 1952 62.4% of degrees from Black colleges went to women.[4] However, “Black women were caught between the two functions they were expected to fulfill: enhancing the material quality of life for their families, and at the same time behaving like housewives.”[5] Another important remark here is the fact that the three protagonists were educated and belonged to the middle-classes.  Even though the film does not mention or show the poorer classes, the women in this film do not represent American Black women in the 1960s, and their position was an exception, rather than the rule.

The East Group vs. the West Group at NASA

When the three women arrive at NASA, they go to the West Group for coloured people where the toilets are not clean, the desks put closely together, and the building blocks exposed. This stands in stark contrast to the architectural details and finishes at the East Group for White people, who get a nicely decorated office and even an armchair in the bathroom. Furthermore, Dorothy has to work a supervisor for the Coloured group, but because they are not assigning a permanent supervisor, she does not get the title or the pay. When she finds out about the construction of an IBM mainframe computer which will eventually take over the function of human “Computers”, she decides to take matters into her own hands. At the library the book on FORTRAN does not belong to the Coloured section, but before the guards can turn her out, she manages to put the book in her purse. She then teaches herself and her division all about working with the machine in order to keep their jobs at NASA, since “somewhere down the line a human being is going to have to hit the buttons.”[6]

In the meantime she gets Mary into a permanent position, and she sends Katherine to the Space Task Computer group. In a group of White men, Katherine faces discrimination on several levels. When she first enters, someone hands her the dustbins assuming she is the cleaning lady. When she puts the bin back down to go to her place, she is stared at as if she is an alien from outer space. Followed by that awkward entry, Katherine faces another challenge, since the bathroom for coloured women is 40 minutes away. After her boss confronts her about her constant absence, she bursts out:

Mr. Harris:            Now where the hell do you go every day?
Katherine:             To the bathroom, sir.
Mr. Harris:            The bathroom! To the damn bathroom! For 40 minutes a day!? What do you do in there!? We are T-minus zero here. I put a lot of faith in you.
Katherine:             There’s no bathroom for me here.
Mr. Harris:            What do you mean there’s no bathroom for you here?
Katherine:             There is no bathroom! There are no colored bathrooms in this building, or any building outside the West Campus. Which is half a mile away! Did you know that? I have to walk to Timbuktu just to relieve myself! And I can’t use one of the handy bikes. Picture that, Mr. Harrison? My uniform. Skirt below the knees and my heels. And simple string of pearls. Well, I don’t own pearls. Lord knows you don’t pay the coloreds enough to afford pearls! And I work like a dog day and night, living of coffee from a pot none of you want to touch! So, excuse me, if I have to go to the restroom a few times a day![7]

When her boss understands that the racism and discrimination is hindering her work, he personally goes to the West Wing to break down the sign reading “colored bathroom”. In front of surprised Black women he shouts “There you have it! No more colored restrooms. No more white restrooms. Just plain old toilets. Go wherever you damn well please. Preferably closer to your desk. At NASA…We all pee the same color!”[8]

Hierarchical Structures: “Fast with rocket ships. Slow with advancement.”

At NASA, there are several hierarchies, starting with Women of Colour being addressed by their first name, whereas White women and men were addressed by their last name. One finding I could not have made manually occurred to me after inserting the text in the Voyant tool.[9] Through textual analysis, it became clear that the most frequent words are Yes (71 instances) and sir (69 instances), often occurring together.[10] When talking to supervisors or other staff higher on the hierarchical ladder, others need to address them in the polite, but almost submissive “Yes Sir.”

Even middle-class and educated women were restricted to female fields, clearly demonstrated twice. Firstly, all Computers were female, and all engineers were male, accompanied by a female secretary. This restriction also shows in the tension between Mr. Stafford and Katherine twice. When Katherine arrives, one of her first jobs is to double-check Mr. Stafford’s math, which he immediately sees as an insult to his work. As a result, he makes her job difficult by crossing out all classified information, and effectively doubling her workload. Later, she has to type his reports and when she adds her name to the list of authors because she contributed, he viciously responds “Computers don’t author reports,” telling her to retype the front page.[11]

[1] “Hidden Figures,” 20th Century Fox, accessed March 16, 2017, http://www.foxmovies.com/movies/hidden-figures.

[2] “Hidden Figures,” 20th Century Fox.

[3] Paula J. Giddings, When and Where I Enter: The Impact of Black Women on Race and Sex in America (Harper Collins, 2009), 73-74.

[4] Giddings, When and Where I Enter, 235.

[5] Giddings, When and Where I Enter, 248.

[6] “Hidden Figures,” 20th Century Fox.

[7] “Hidden Figures,” 20th Century Fox.

[8] “Hidden Figures,” 20th Century Fox.

[9] “Voyant Tools,” Stéfan Sinclair and Geoffrey Rockwell, last modified 2017, voyant-tools.org.

[10] Ibid.

[11] “Hidden Figures,” 20th Century Fox.

Pièce de Résistance: Master Thesis Template

I would now like to present my pièce de résistance, my gift to you: a LaTeX-template, including a BiBTeX-database with all the examples discussed in my previous blogposts. You can find all the code in either (ironically) a .docx-file, or a PDF. You will also need the figures attached at the end of this blogpost.

The bones of the front page come from the Faculty of Science template, which I adapted by replacing some colors and text. The document is well structured and the comments explain what each block of code does. This means you can adapt page settings, additional cover settings, and so on. Before you get started, add the title and subtitle, your name, (co)supervisor, master and academic year.

Before the table of contents, the template starts with acknowledgements and an abstract. You can add the copyright-page by removing the % sign in front of \input{docs/copyright}. Furthermore, the table of contents precedes a list of tables and figures, which are all numbered in roman style (i, ii, iii). The page numbering switches to arabic numbers from the introduction on. The master file then starts including five exemplary chapters, as well as the bibliography in the plainnat style as explained in BiBTeX – your new best friend.

Here is everything you need:

Guide and code of the Master Thesis Template (docx)

Guide and code of the Master Thesis Template (PDF)

Master Thesis Template (PDF)

 

BiBTeX – your new best friend

We’ve all been there. You finished your text, your lay out is great, you more or less added footnotes and references. But now you have to start adding your bibliography. Preferably adhering to pages and pages of guidelines and rules. Luckily, 20 years ago, Oren Patashnik figured you needed help. He created BiBTeX to work along LaTeX by utilizing a plain-text file-format which can be created and modified in any arbitrary text-editor (learn more here).

A BiBTeX database contains all your entries in a .bib file, which can be used and reused in any LaTeX file. An example of an entry in this database is:

@article{perspectiveshistory,
title={Has the Battle Been Won? The Feminzation of History},
author={Hunt, L.},
journal={Perspectives on History},
volume={36},
number={5},
year={1998},
doi={https://www.historians.org/publications-and-directories/perspectives-on-history/may-1998/has-the-battle-been-won-the-feminization-of-history}
}

In this case the reference type is an @article, while all details are listed inbetween {}. The first word after the opening curly brace is the citation key or the BiBTeX key, unique for each entry and used to cross-reference a citation to your bibliography. This means you don’t even have to manually add the author and year to your in-text references, but simply \cite{citation key} in your text. In this case I can refer to the article by \cite{perspectiveshistory}, but more on citations later. An article entry needs to contain the author, journal, title, and year of publication. You can add these elements as demonstrated in the example, by element={} or element=””. The optional elements in case of the article type are the month, note, number, pages, and volume. I also included the doi, or domain object identifier, a specific and unique link to the article. You can find the required and optional fields for each reference type here.

The standard BiBTeX entries include: @article, @book, @booklet, @inbook (for a specific chapter or entry in a book), @inproceedings (a conference paper), @manual, @mastersthesis or @phdthesis, @misc (for other publication types), @proceedings (a collection of conference papers), @techreport, and @unpublished. For more information on standard templates, you can always go the the wikibooks documentation.

Once you added all your references to your BiBTeX database – no worries, most online bibliographies allow you to export the citation in a BiBTeX format – you need to insert your bibliography in your LaTeX file. Usually a bibliography comes after your content, but before you \end{document}. First, you choose the \bibliographystyle{plain}, which refers to the standard included plain.bst or style file. You can also find style files for most journals that defined their own reference style. Next, you simply include your \bibliography{bibfile} withouth adding the .bib extension. If your BiBTeX database is located in another file, you need to add the location such as \bibliography{refs/references}. Once you included your bibliography, you can create citations easily with \cite{reference}, or \citep{reference} if you want to use the plain bibliography style for your citations.

Finally, I would like to add that in order to adapt the appearance of your bibliograhpy to the document language for words such as editors, and, or in, you can add \usepackage[fixlanguage]{babelbib} to your preamula and \selectbiblanguage{dutch}. You also need to select a bibliography style which supports this package, such as \bibliographystyle{babplain}. If you want LaTeX to also display BiBTeX entries which were not cited in the text, you can add \notice{*} to show all entries, or \notice{name} for a specific entry. For my next and final entry on LaTeX, I will try to create a template for the master thesis of the faculty of Arts at the KU Leuven. Wish me luck!