The ‘neutrality’ of search engines

The past two weeks I had a “presentation skills training” where we had to choose a topic that was completely new to us. Because I was quite interested in looking at a tool most of us use every day, I decided to take a closer look at the Anatomy and Workings of Search Engines. Since one of those search engines has become a common verb in most languages, let’s start from the Oxford English Dictionary’s definition of “to google”:

Pronunciation:Brit. /ˈɡuːɡl/, U.S. /ˈɡuɡ(ə)l/

1. intr. To use the Google search engine to find information on the Internet.

2. trans. To enter (a search term) into the Google search engine to find information on the Internet; to search for information about (a person or thing) in this way.

Since people use Google and other search engines regularly, they tend to rely on four common assumptions as identified by Bettina Fabos in 2003:

  1. Search engines are impartial information tools.
  2. Search engines search the entire Web, gleaning the most relevant results.
  3. Search engines vary greatly, thus offering choice and a competitive marketplace.
  4. Search engines are the only place to go for relevant information on the web.

Before I can contradict these four assumptions, it is important to understand the structure of the search engine industry.


Search Industry Structure

LESSON 1: Search Engines are not impartial, they are part of an industry.
Screen Shot 2017-06-22 at 10.21.04
Search Industry structure, based on the article by Bettina Fabos in 2003, using the Eleanor template from slides
Directories are quite simply databases that contain information. In this case they contain web pages in indexed lists to feed the search engine providers. Search engine providers may use existing directories, or build their own directories by crawling the web. In order to understand the workings of search engine providers, a good starting point is the famous paper dating back to 1998 by Sergey Brin and Lawrence Page, PhD students at Stanford University and founders of Google. You can understand crawling as spiders that are sent out over the web to the location that the URL server gave them in order to gather the information in a web repository.  After collecting web pages, the indexer will convert web pages (also called documents) into a list of word occurrences or hits and add sufficient metadata which is stored in barrels. The sorter than needs to invert this index by converting the list of words attached to each document into a list of documents for each word.
So imagine looking for Katy Perry (famous example in this video). Simply put, the search engine provider will compare a list of documents containing the word ‘Katy’ to a list containing the word ‘Perry’ and feed back only those documents containing both words, preferably close to each other. However, you don’t want just any document that contains Katy Perry, you need relevant web pages. This is where the ranking algorithm comes into place, especially Google’s PageRank algorithm. The algorithm looks at the ‘popularity’ of a web page based on how many other (relevant) web pages refer to it.
Finally, the search engine portal is any website containing a search bar, which means even this page could be understood as a search engine portal. The importance of portals lies in their usability and their user friendliness.



LESSON 2: Search engines do not search the entire web, especially not the deep web. Some directories only include paid-for content.
Screen Shot 2017-06-22 at 13.32.26
Models of sponsorship, based on the article by Bettina Fabos in 2003, using the Eleanor template from slides


Google’s initial strategy was selling their technology as a search engine provider not only to other search engines, but also to other websites that use Google’s technology to power search on their own website. This strategy brings in some money, but it’s not a continuous flow of cash. Therefor, some other search engines chose to focus more on marketing agencies trading for space thus including sponsored links. Within the search industry structure, it is clear that the best way to appear in search results over different search engines, is to pay in order to add (commercial) content to the repository. However, in order to determine the efficiency of advertisements, search engines and marketeers have agreed to a pay-by-performance strategy, which means advertisers only pay the search engine when someone actually clicks on a link. Other search engines have gone for more aggressive strategies such as paid inclusion, where the advertisement appears in every search, but had to sacrifice users because of it.



LESSON 3: There is little competition in the marketplace, with mainly American companies dominating the search industry.

The search engine industry used to be dominated by three main companies: Google, Yahoo!, and Microsoft. Add to that the current trends in voice search and as this infographic beautifully demonstrates. Amazon might not be the richest company, but they certainly have the first-mover advantage and are quickly connecting their Alexa voice assistant to cars and home devices, effectively taking over parts of our lives.


Screen Shot 2017-06-22 at 13.55.09
Infographic showing companies leading the space race in voice search (search engine watch).

This beautiful infographic also shows the challenges related to monitising voice search. Noble as the goals of Google may be in its promises of search engine integrity, the main thing that counts at the end of the day, is money. The reason Amazon holds such a large segment of the market, could be due to the connection with their online store possibilities and connections to apps and services such as music streaming services.


LESSON 4: There are other places to go for information other than search engines. - perhaps the library? -

Independent institutions such as the university, but also public libraries should warrant access to knowledge that does not necessarily have a commercial value. Therefore, these public institutions need to counteract the commercial interests of search engines by focussing on Open Access. In this regard, the Digital Humanities field is one of the first and most important advocates for Open Access. It is this years main theme of the Alliance of Digital Humanities conference in Montréal Another important yet political and legal institution working on Open Access is the Open Access Infrastructure for Research in Europe or OpenAIRE. Preferably Open Access information is accessible through Google, but we also need to think about back-ups and other ways of disseminating information.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s