Did you mean, "male lawyer"? On language translators and gender bias

French philosopher Pierre Bourdieu has insisted that naming is an exertion of power. Powerful elites have long used naming and classifying not only to legitimize knowledge, but also to perpetuate and reproduce harmful and oppressive historical practices. Language can liberate, but it can also harm. The way we talk and the words we use are intimately connected to social inequalities and discrimination. And while technology has unquestionably propelled us forward at startling speeds, the epicenter of knowledge in Silicon Valley is not altogether different from the ruling elite that has governed Washington D.C. and state capitols for centuries, in that it is largely dominated by wealthy white males. Everything else is a deviation. We are only beginning to understand the dangers we face when historic inequalities are reinforced by invisible technologies that appear neutral.

Algorithms are everywhere. They decide how we dress, what we eat, where we travel, and essentially rule over our daily life. We are told that privacy is dead , but we are not told how the systems that have reduced our agency function. Instead, we hear claims that technology is neutral—that when a computer says no, no should be the answer because, come on, machines are infallible. We even use the word “smart” to describe objects that have integrated digital sensors or internet connectivity, despite the fact that adding the internet to a toaster is actually pretty stupid.

One of the most widely used algorithms today is the one that animates Google Translate. For years, language translators were useless because, despite the fact that they had access to enormous datasets of words in different languages, they could not, for all the hype about artificial intelligence, harness the essence of language: how words join to create sentences and convey meaning. Google Translate has come close to solving this problem. That’s because the algorithm has learned from the sentences and corrections millions of people enter into the system, year after year. Thanks to all that human labor, Google’s algorithm now does a fairly good job translating complex sentences and grammatical constructions.

But the millions of people who helped to train Google Translator didn’t only help Google’s algorithm get better at doing its core job; it also saddled Google’s system with all the implicit and explicit biases that human beings exhibit when they talk and write.

These biases are especially tricky and problematic when Google Translate is asked to process translations between languages that deal differently with gendered nouns. For example, in Spanish and most Latinate languages, unlike English, nouns are gendered. In Spanish, because of patriarchal cultural norms and the historic dominance of men, the male noun became the generic “non-gendered” way of referring to most things. In some cases, the gender of nouns is directly related to societal norms, such as with some professions and roles in society that have historically been reserved for men or women. Language evolved accordingly. And because human beings have trained Google’s algorithms, these gendered linguistic norms now appear in Google Translator outputs—with often frustrating results.

Last week, a Twitter user tweeted a screenshot revealing sexist bias in the system. The person used Google Translate to translate an English letter into Spanish. In English, the letter was directed to “Professor.” Google Translate, informed by its users, produced an output indicating that the letter was directed to a male professor. That got me thinking about what other bias might exist in Google Translate, so I looked up some other translations from English to Spanish.

The result? A whole lot of gendered bias. “Dear Doctor” translated to “Querido Doctor,” masculine. “Hairdresser” translated to “peluquero” because, of course, only men are hairdressers. Housekeeper translated to “ama de casa,” which basically means “housewife.” “Dear Kindergarten teacher” and “dear preschool teacher” both translated to “querida maestra de kinder” and “querida maestra de preescolar,” suggesting that only women teach young people. (If you remove youth from the equation, the translator gives you the opposite output: “Dear teacher” translated to “querido maestro,” a man.) More examples: “dear lawyer” translated to “querido abogado.” “Scientist” returned a “cientifico.” All men. No women scientists, nor lawyers; no Marie Curie and no Ruth Bader Ginsburg.

Then I entered the words “boss” and “supervisor” into the translator. Not only did Google Translate return the words “jefe” and “supervisor,” both male gendered nouns, but a deeper function of the translator offered four types of leadership positions (jefe, patron, cacique and mayor) for boss and 3 type of positions (supervisor, inspector y controlador) for supervisor. Every single one of those words is gendered male.

Politicians and public officials are not safe from the gender bias either. “Dear President” returned “querido president,” “dear senator” returned “querido senador,” and “head of State” returned “Jefe de Estado. All male nouns, despite the fact that Latin America, the biggest Spanish speaking community in the world, has had more female presidents and heads of state than any other part of the world.

Cristina Fernández de Kirchner, the former President of Argentina.

We’ve been told that technology will liberate us, but as this small experiment shows, technology can also codify historic biases and inequalities, all the while making the bias appear neutral and natural. Technology is not neutral, and language algorithms, like others programmed by human beings and trained with human inputs, are just as likely to reproduce bias as any human being.

But once these biases in the machine are made visible, tech companies have the opportunity—indeed, the responsibility— to stop the reproduction of age old inequalities in their own systems.

In this case, one simple thing Google could do to address the problem of sexism in its translator is to keep up to date with the latest developments in language studies. In Spanish, for example, there is a new trend of “inclusive language” that uses the letter “e” or the character “@” when using nouns or describing a group of people that contain both genders. In this way, the masculine form that ends in an “o” does not override the feminine “a,” and space is created in the language for gender non-conforming and non-binary people. Another option would be to provide users with both the feminine and masculine translations of words and phrases, to alert users who may be unfamiliar with the ways foreign languages deploy gender. This would be a very simple change in the user interface, but it could make an enormous impact.

Tweaks like these may seem like small interventions, but as Bourdieu observed, power and language are inextricably linked. Tech companies may not have historically worried much about the relationship between power and language, but now that they are offering translator tools to the world, and mediating so much of human culture, thought, and expression, they must.

This post was written by Technology for Liberty Policy Counsel Emiliano Falcon-Morano.

Privacy SOS

Did you mean, “male lawyer”? On language translators and gender bias