Digital Discrimination: How Systemic Bias Is Built Into the Internet

As tech companies shape our perceptions, bias and discrimination are embedded into their products and services at multiple levels.

A search bar is framed by various computer windows showing online algorithmic imagery.
Carmen Deñó

In 2009, Safiya Noble typed “Black girls” into Google’s search engine after a friend told her that she’d be surprised by what she found. She was — the top results were largely pornographic

Two years later, as she was completing her doctorate in library and information science, Noble wondered whether her extensive online engagement with Black feminist media would be positively reflected if she searched again for “Black girls.” After all, one of Google’s selling points was personalized search, where results would be served to you depending on what your interests seemed to be. But this wasn’t the case — the top results she received were still pornographic and derogatory.

Noble is now an assistant professor at UCLA, where she co-directs the Center for Critical Internet Inquiry. In her book Algorithms of Oppression: How Search Engines Reinforce Racism, she lays out how these tools, which are often thought to be neutral mechanisms rather than content curators, reflect biases that exist within larger society — and she contends that this kind of discrimination is built into the internet as we know it. 

Internet search engines are based on “information retrieval,” which involves identifying particular pieces of information from a dataset. In the 1960s, well before the internet as we know it began to emerge, Gerard Salton and his colleagues at Cornell developed the SMART Information Retrieval System, introducing concepts like relevance feedback, term weighting, and term dependency. In 1990, a service called Archie created a searchable index of files that were available on public file transfer protocol (FTP) servers, which were used to transfer data between computers. Archie inspired other crude pre-web search tools like Veronica and Jughead, but the indexes were maintained manually and had limited search functions.

The advent of the World Wide Web brought a wave of new search engines and “web crawlers” that automatically identified online content to create searchable indexes. Google, which was incorporated in 1998, introduced a PageRank algorithm that measured the quality of search results by assessing how many “quality” hyperlinks directed to a given web page. The speed, depth, and relevance of Google’s search results quickly distinguished it in a crowded field, which was largely decimated after the dot-com bubble of the late 1990s burst.

The overarching issue is that tech companies are shaping our perceptions in ways both big and small, and that bias and discrimination are embedded into their products and services at multiple levels.

Harvard professor Shoshana Zuboff argues in her book The Age of Surveillance Capitalism that Google survived this tumult and grew dominant by aggressively monetizing the voluminous “collateral” search data it collected and triangulating it with advertising. “Operationally, this meant that Google would turn its own growing cache of behavioral data and its computational power and expertise toward the single task of matching ads with queries,” she writes. Capturing and analyzing user data logs generated by queries made Google the biggest name in search; it currently controls more than 86 percent of the global search engine market. This dominance has fueled its expansion into other products and services, ultimately precipitating an antitrust lawsuit, with the US Department of Justice accusing the company of anticompetitive tactics to protect its monopoly over search and search advertising. 

While Google has become ubiquitous, the factors that contribute to the ranking of its search results are still inexact. Known factors include how many websites link to a certain page, how much traffic and unique page views it receives, how often a page is updated, and more. But because different users will see different results based on their perceived interests as derived from their data logs, Google’s personalized search function often amplifies the display of biased information.

In Algorithms of Oppression, Noble traces the radicalization of Dylann Roof, the white supremacist who in 2015 murdered church parishioners in Charleston, South Carolina. Roof would Google phrases like “Black on white crime” and be directed to misinformation that was rife with racial stereotypes and incitement to violence against Black communities. In 2016, the journalist Carole Cadwalladr was dismayed to find that entering queries about Jews or Muslims into Google brought up such predictive searches as “are Jews evil?” and “are Muslims bad?” Invariably, the top results to such questions suggested that, yes, they were indeed.

The effect of such results can be cyclical — as search algorithms notice that a particular page or series of pages are particularly popular, they are pushed further and further up the rankings. Increased public scrutiny of this dynamic has forced Google to constantly tweak and re-tweak its algorithm in an attempt to minimize such results, though precisely how and to what effect remains unclear.

This problem isn’t confined to search engines. Last year, a group of LGBTQ YouTube creators filed a lawsuit against YouTube and Google, its parent company, alleging that YouTube’s AI-driven recommendation algorithm suppressed their videos and restricted their ability to earn money from them because it discriminated against their content. A group of Black YouTube creators filed a similar lawsuit against YouTube/Google this past summer, alleging systematic discrimination. (YouTube insists that its systems do not identify users based on race, ethnicity, or sexual orientation, and is fighting the lawsuits.) YouTube has also been steadily accused of contributing to right-wing radicalization through its recommendation algorithm, which the company has said is responsible for 70 percent of the platform’s watch time. 

The racial bias that Noble found in Google’s search results is also evident in its ad-buying portal, which generated more than $134 billion last year. A recent investigation by The Markup found that the Google Ads Keyword Planner, which helps advertisers pick which search terms to associate with their advertising, suggested mostly pornographic keywords to match the phrases “Black girls,” “Latina girls,” and “Asian Girls.” A similar effect was observed when substituting the word “boys” in those phrases, but no such effect was observed when inputting “white girls” or “white boys.” 

Additionally, separate analyses by The Markup found that 41 percent of the first page of Google’s search results is dedicated to the company’s own products, and that many emails that referred to racial justice and Black Lives Matter following the death of George Floyd were categorized under the “promotions” tab on Gmail, which generally applies to marketing materials.

Small concessions aren’t enough, and they don’t fix the underlying logic that has led to these systems becoming engines that perpetuate misinformation.

Computer vision algorithms can reflect bias as well by disproportionately focusing on white faces and obscuring others. Colin Madland, a white Canadian doctoral student in educational technology, noticed this when meeting with a Black colleague over Zoom, which had the bizarre effect of erasing his colleague’s face. When he posted images to illustrate this on Twitter, he found that the social media platform’s image-cropping algorithm automatically cropped the preview to focus only on his face rather than his colleague. Over the next few days, countless people posted photos of white people and people of color on Twitter to test this for themselves. Twitter soon apologized and changed the algorithm — despite the fact that it had already tested the feature for bias before launching it.

Google was similarly criticized in 2015, after its Google Photos service labeled photos of Black people as gorillas. When Wired followed up on the promised fixes in 2018, testing a large collection of animal photos on the service, it found that the terms “gorilla,” “chimp,” “chimpanzee,” and “monkey” simply returned no results. Wired also tested thousands of images used in facial-recognition research against terms like “Black man” and “Black woman,” and was presented with black-and-white images of people that didn’t accurately reflect race. “Image labeling technology is still early and unfortunately it’s nowhere near perfect,” said a Google spokesperson at the time.

The company’s recent firing of the distinguished AI ethicist Timnit Gebru following an internal dispute over a research paper is also a discouraging sign. According to MIT Technology Review, the paper examined the risks of training massive AI language models, including the prospect of racist, sexist, or abusive language affecting the training data.

“We are working at a scale where the people building the things can’t actually get their arms around the data,” said Emily M. Bender, one of the paper’s co-authors. “And because the upsides are so obvious, it’s particularly important to step back and ask ourselves, what are the possible downsides? … How do we get the benefits of this while mitigating the risk?”

Problems with algorithmic bias arise from the impulse to categorize and, inevitably, the hierarchies that emerge from such processes when we live in a deeply unjust society. The overarching issue is that tech companies are shaping our perceptions in ways both big and small, and that preference and discrimination are embedded into their products and services at multiple levels. Every time a public figure or major news entity raises an issue with Google, the company seems to make an adjustment. But these small concessions aren’t enough, and they don’t fix the underlying logic that has led to these systems becoming engines that perpetuate misinformation. 

“When I was talking 10 years ago about things like algorithmic discrimination, saying it could happen at the level of code and in terms of deployment of these technologies, very few people believed it,” Noble remarked recently to the Institute of AI. “It was very difficult in those early years to be understood and to have this be part of the mainstream conversation. The mainstream conversation a decade ago was that computer science is just applied math and applied math can’t be racist because math can’t be racist.”

Now we know better.

Follow The Reboot

Join a growing community that’s examining the state of the internet and exploring its future. Subscribe to our monthly newsletter.

A search bar is framed by various computer windows showing online algorithmic imagery.

Artwork By

Carmen Deñó

Contact Us

Have an idea for a story or illustration? Interested in discussing partnerships? We want to hear from you. Send us a note at info(at)thereboot(dot)com.

Recommended Reading