The ‘neutrality’ of search engines

The past two weeks I had a “presentation skills training” where we had to choose a topic that was completely new to us. Because I was quite interested in looking at a tool most of us use every day, I decided to take a closer look at the Anatomy and Workings of Search Engines. Since one of those search engines has become a common verb in most languages, let’s start from the Oxford English Dictionary’s definition of “to google”:

Pronunciation:Brit. /ˈɡuːɡl/, U.S. /ˈɡuɡ(ə)l/

1. intr. To use the Google search engine to find information on the Internet.

2. trans. To enter (a search term) into the Google search engine to find information on the Internet; to search for information about (a person or thing) in this way.

Since people use Google and other search engines regularly, they tend to rely on four common assumptions as identified by Bettina Fabos in 2003:

  1. Search engines are impartial information tools.
  2. Search engines search the entire Web, gleaning the most relevant results.
  3. Search engines vary greatly, thus offering choice and a competitive marketplace.
  4. Search engines are the only place to go for relevant information on the web.

Before I can contradict these four assumptions, it is important to understand the structure of the search engine industry.

 

Search Industry Structure

LESSON 1: Search Engines are not impartial, they are part of an industry.
Screen Shot 2017-06-22 at 10.21.04
Search Industry structure, based on the article by Bettina Fabos in 2003, using the Eleanor template from slides carnival.com
Directories are quite simply databases that contain information. In this case they contain web pages in indexed lists to feed the search engine providers. Search engine providers may use existing directories, or build their own directories by crawling the web. In order to understand the workings of search engine providers, a good starting point is the famous paper dating back to 1998 by Sergey Brin and Lawrence Page, PhD students at Stanford University and founders of Google. You can understand crawling as spiders that are sent out over the web to the location that the URL server gave them in order to gather the information in a web repository.  After collecting web pages, the indexer will convert web pages (also called documents) into a list of word occurrences or hits and add sufficient metadata which is stored in barrels. The sorter than needs to invert this index by converting the list of words attached to each document into a list of documents for each word.
So imagine looking for Katy Perry (famous example in this video). Simply put, the search engine provider will compare a list of documents containing the word ‘Katy’ to a list containing the word ‘Perry’ and feed back only those documents containing both words, preferably close to each other. However, you don’t want just any document that contains Katy Perry, you need relevant web pages. This is where the ranking algorithm comes into place, especially Google’s PageRank algorithm. The algorithm looks at the ‘popularity’ of a web page based on how many other (relevant) web pages refer to it.
Finally, the search engine portal is any website containing a search bar, which means even this page could be understood as a search engine portal. The importance of portals lies in their usability and their user friendliness.

 

Sponsorship

LESSON 2: Search engines do not search the entire web, especially not the deep web. Some directories only include paid-for content.
Screen Shot 2017-06-22 at 13.32.26
Models of sponsorship, based on the article by Bettina Fabos in 2003, using the Eleanor template from slides carnival.com

 

Google’s initial strategy was selling their technology as a search engine provider not only to other search engines, but also to other websites that use Google’s technology to power search on their own website. This strategy brings in some money, but it’s not a continuous flow of cash. Therefor, some other search engines chose to focus more on marketing agencies trading for space thus including sponsored links. Within the search industry structure, it is clear that the best way to appear in search results over different search engines, is to pay in order to add (commercial) content to the repository. However, in order to determine the efficiency of advertisements, search engines and marketeers have agreed to a pay-by-performance strategy, which means advertisers only pay the search engine when someone actually clicks on a link. Other search engines have gone for more aggressive strategies such as paid inclusion, where the advertisement appears in every search, but had to sacrifice users because of it.

 

‘Googlearchy’

LESSON 3: There is little competition in the marketplace, with mainly American companies dominating the search industry.

The search engine industry used to be dominated by three main companies: Google, Yahoo!, and Microsoft. Add to that the current trends in voice search and as this infographic beautifully demonstrates. Amazon might not be the richest company, but they certainly have the first-mover advantage and are quickly connecting their Alexa voice assistant to cars and home devices, effectively taking over parts of our lives.

 

Screen Shot 2017-06-22 at 13.55.09
Infographic showing companies leading the space race in voice search (search engine watch).

This beautiful infographic also shows the challenges related to monitising voice search. Noble as the goals of Google may be in its promises of search engine integrity, the main thing that counts at the end of the day, is money. The reason Amazon holds such a large segment of the market, could be due to the connection with their online store possibilities and connections to apps and services such as music streaming services.

Conclusion

LESSON 4: There are other places to go for information other than search engines. - perhaps the library? -

Independent institutions such as the university, but also public libraries should warrant access to knowledge that does not necessarily have a commercial value. Therefore, these public institutions need to counteract the commercial interests of search engines by focussing on Open Access. In this regard, the Digital Humanities field is one of the first and most important advocates for Open Access. It is this years main theme of the Alliance of Digital Humanities conference in Montréal Another important yet political and legal institution working on Open Access is the Open Access Infrastructure for Research in Europe or OpenAIRE. Preferably Open Access information is accessible through Google, but we also need to think about back-ups and other ways of disseminating information.

Advertisements

Descending from the Ivory Tower – Digital Humanities Beyond Academia

As I already stated in my very first blogpost, I am a first generation Digital Humanist. I started this masters program an academic year ago, in 2015. In a month or so from now, I hope to graduate for the third and final time. I will be thrown under the bus – I mean into the job market – by February. How nice it would be, to stay safely hidden in the Ivory Tower of academia, not having to face whether or not I am truly qualified for the real world. It reminds me of a comment one of my professors once made about the similarity between Italian and Belgian students, remaining under the care of their parents until graduation. I tested myself already, I am perfectly capable of living abroad and taking care of myself, that is, with financial aid and Skype nearby. Now is the time to let it go, to become truly independent. But will I be independent from academia as well?

Stéfan Sinclair puts my internal debate into words in his blogpost on Digital Craft and Humanistic Perspectives Beyond Academia:

Don’t count on an academic job as a reward for your travails (in other words, don’t consider me as a model) and don’t count on your studies to prepare you for easy access to non-academic jobs.
(Sinclair, 2013)

Where do we stand as future masters in Digital Humanities? Do we stick to the tricky search of finding a job as a humanist, albeit some extra capabilities, or do we use our newly found digital confidence to demand a job in the promising world of IT? Is there a middle ground? Is there someting inbetween academia and the outside?

Even for those who do get the change to work on a PhD, possibilities for their academic employment increasingly drop, since the number of tenure track jobs available rapidly decreases for humanities scholars. Another option, discussed by Katarina Rogers, is that of alternative academics, or AltAc:

People with advanced humanities degrees who find stimulating careers in and around the academy but outside the tenure track.
(Rogers, 2015)

Some of those jobs outside the academy can exist in libraries, museums, archives, humanities centres and labs, presses, and so on (Rogers, 2015). In order for students and academics alike to prepare for a job out of the ivory tower, existing programs need to prepare their students adequately for an ever changing job market and society. The Digital Humanities are setting a good example since many of its implicit skills such as “collaboration, project management, and technological fluency” gain importance both within the academy and outside (Rogers, 2015). It is not necessarily about the specific job or career, but

People that identify with the term [alternative academic] tend to see their work through the lens of academic training, and incorporate scholarly methods into the way that work is done.
(Rogers, 2015)

That, together with all the reasons for why the humanities matter, will guide me through the maze. Furthermore, I also believe that the digital in Digital Humanities, increases my opportunities in the current society. Hopefully others will see the importance of the humanities, along with the promising but ever critical digital humanities.

Bibliography

Sinclair, Stéphan. “Digital Craft and Humanistic Perspectives Beyond Academia.” 2013. http://stefansinclair.name/digital-craft-and-humanistic-perspectives-beyond-academia/.

Rogers, Katina. “Humanities Unbound: Supporting Careers and Scholarship Beyond the Tenure Track.” Digital Humanities Quarterly, 9(1), 2015. http://www.digitalhumanities.org/dhq/vol/9/1/000198/000198.html.