‘As We May Think’ – part 2

In the first few chapters of “As we may think” Bush discusses the difficulties on how to create a record. He already envisioned solutions to shorten the process of creating and spreading research.

To make the record, we now push a pencil or tap a typewriter. Then comes the process of digestion and correction, followed by an intricate process of typesetting, printing, and distribution. To consider the first stage of the procedure, will the author of the future cease writing by hand or typewriter and talk directly to the record?[1]

After the invention of the Voder which emitted recognizable speech when typed to, the Vocoder did exactly the opposite. “Speak to it, and the corresponding keys move.” Another technology already in existence at the time Bush wrote his influential article, was the stenotype “which records in a phonetically simplified language”. “Combine these two elements, let the Vocoder run the stenotype, and the result is a machine which types when talked to.”[2] The technology that enables devices to respond to spoken commands called speech recognition, exists in both mobile phones and tablets, as well as laptops. Most common uses include dictation, search and giving commands to computers. Another example of speech recognition is Apple’s Siri, a personal assistant on their smartphones.

In the Humanities, and Digital Humanities especially, researchers have shown an interest in non-text materials because of the “massive increase in the quantity and availability of audiovisual (AV) materials and a rapid development in technology for handling such materials”.[3] Speech recognition is mostly linked to transcribing audio-visual materials, especially speech-to-text transcription.

Schematic model of humanities research with AV materials

Since speech recognition systems are so domain specific, they cannot handle every domain and “even speech from the same domain that differs from the ‘training’ data may be problematic”.[4]

Speech-to-text transcriptions have historically comprised an unpunctuated and unformatted stream of text. There has been considerable recent research into generating ‘richer’ transcriptions annotated with a variety of information that can be extracted from the audio signal and/or an imperfect transcription. […] Investigations have often used speech from only a small set of domains, such as broadcast news and conversational speech. Emotion-related work in particular is very preliminary.[5]

Strategies on how to expand the role of audiovisual media in Digital Humanities was discussed during the Digital Humanities Conference 2014 in Lausanne, at a workshop by researchers involved in AXES.[6] Research fields which might benefit from speech recognition include: film, television, radio, and oral history.

[1] Vannevar Bush, “As We May Think,” Atlantic Monthly, July 1945, http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/, chapter 3.

[2] Bush, “As Wy May Think,” chapter 3.

[3] Alan Marsden, Adrian Mackenzie and Adam Lindsay, “Tools for Searching, Annotation and Analysis of Speech, Music, Film and Video – A Survey,” Literary and Linguistic Computing, no. 22,4 (2007): 469-488, accessed January 9, 2017, doi: 10.1093/llc/fqm021.

[4] Marsden et. Al., “Tools for Searching”.

[5] Marsden et. Al., “Tools for Searching”.

[6] “AV in DH Special Interest Group,” Access to Audiovisual Archives, February 12, 2015, http://www.axes-project.eu/?p=2419.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s