In 2016, The Seattle Times uncovered an issue with a popular networking site’s search feature. When the investigative reporters entered female names into LinkedIn’s search bar, the site asked if they meant to search for similar sounding male names instead—“Stephen Williams” instead of “Stephanie Williams,” for example. According to the paper’s reporting, however, the trend wouldn’t happen in reverse, when a user searched for male names.
Within a week of The Seattle Times article’s release, LinkedIn introduced a fix. Spokeswoman Suzi Owens told the paper that the search algorithm had been guided by “relative frequencies of words” from past searches and member profiles, not by gender. Her explanation suggests that LinkedIn’s algorithm was not intentionally biased. Nevertheless, using word frequency—a seemingly objective variable—as a key parameter still generated skewed results. That could be because men are more likely to have a common name than American women, according to Social Security data. Thus, building a search function based on frequency criteria alone would more likely increase visibility for Stephens than Stephanies.
Examples like this demonstrate how algorithms can unintentionally reflect and amplify common social biases. Other recent investigations suggest that such incidents are not uncommon. In a more serious case, the investigative news organization ProPublica uncovered a correlation between race and criminal recidivism predictions in so-called “risk assessments”—predictive algorithms that are used by courtrooms to inform terms for bail, sentencing, or parole. The algorithmic predictions for recidivism generated a higher rate of false-negatives for white offenders and a higher rate of false-positives for black offenders, even though overall error rates were roughly the same.
ProPublica’s investigation exposes how data-driven analytics used to aid decision-making can have serious consequences on people’s lives. Companies, government institutions, and data scientists in all sectors need reliable methods for eliminating unintentional bias from data-driven decision-making. Fortunately, investigative studies by academics and journalists are not the only way to audit automated systems and mitigate these risks.
The people behind big data
Behind every data-driven decision lies a series of human judgments. Decisions about what variables to use, how to define categories or thresholds for sorting information, and which datasets to use to build the algorithm can all introduce bias. Left unexamined, value-laden software can have unintended discriminatory effects that perpetuate structural inequality. This is particularly true when algorithms are tasked with making critical decisions about people’s lives, like who qualifies for parole, who receives favorable credit offers, or who makes a “good” job candidate.
Data-driven decision-making systems may seem to rely entirely on objective data, but most still involve value judgments about how that data should be analyzed. How should success be defined, for example. What characteristics in the data should be included in the analysis? Into what categories should cases be sorted? The answers to these questions may vary based on who is designing the algorithm, their motivations, and their worldview.
Human judgement lies behind every data-driven decision. Left unexamined, value-laden software can have unintended discriminatory effects.
Let’s take a basic example. Suppose two people are tasked with developing a system to sort a basket of fruit. They have to determine which pieces are “high quality” and will be sold at the market, and which will instead be used for making jam. Both people are given the exact same data—the fruit—and the same task of determining the fruits’ relative quality. To solve this problem:
- The goal has to be defined. (Success = fruit is correctly sorted; error = fruit is misclassified)
- Possible outcomes have to be defined. (Fruit goes to market or to the jam factory)
- And parameters have to be defined for the key variable: quality. (The fruits’ shape, color, number of bruises, sheen, etc.)
Given the same task and data, the two people are likely to have different results. Perhaps one person believes the primary indicator of a fruit’s quality is brightness of color. That person may sort the fruit based on how vibrant it is, even though not all fruits are brightly colored; that person would send strawberries to the market and melons to the jam factory. Meanwhile, the other person might believe that unblemished fruit is the best quality, even though fruits with protective rinds might look scruffy on the outside, but are perfectly fine on the inside; that person could send unripe strawberries to the market and ripe melons or bananas to the jam factory. These different, yet similarly logical and evenly applied criteria, will result in two different outcomes for the same basket of fruit. But both send too many melons to the jam factory because the people sorting the fruit are using proxies for quality that don’t account for the best characteristics of melons.
This example represents a relatively unsophisticated version of algorithmic decision making, but a similar version has been tested for sorting cucumbers in Japan. Makoto Koike wanted to apply machine learning to help his mother more efficiently sort the cucumbers from her farm. Rather than asking his mother to define the features she used for sorting, Koike tasked her with sorting a bunch of cucumbers. He then optically scanned the cucumbers and used computer software to identify the common traits and then built an algorithm to replicate her work. That algorithm’s understanding of what makes a “good” cucumber was based on Koike’s mother’s interpretation and intuition.
It’s one thing to have an algorithm that marginalizes melons or unfairly sorts cucumbers, but what happens when algorithms make important decisions about humans?
Consider the data used to determine consumer credit scores. Ten years ago, the Federal Reserve Board, under direction from Congress, evaluated whether credit scoring methods were discriminatory. In its report to Congress, the Federal Reserve Board revealed a strong correlation between credit scores and race and other demographic indicators, even though “credit characteristics included in credit history scoring models do not serve as substitutes, or proxies, for race, ethnicity, or sex.” Nevertheless, credit scoring models disadvantage some segments of the population more than others. Immigrants tend to have lower credit scores, for example—not because scoring algorithms are trained to assign immigrants lower credit scores, but because length of credit history weighs heavily in scoring models, and recent immigrants will have had less time to develop their credit histories.
When algorithmic designers ignore social nuance or inequality, they risk designing systems that create disparate impacts. In the case of the creditworthiness research, however, the Federal Reserve Board exposed a different problem with scoring models: that existing credit scoring indicators are robust, but potentially insufficient. In the case of recent immigrants, “expanding the information supplied to credit-reporting agencies to include rent, other recurring bill payments, nontraditional uses of credit, and the credit histories of the foreign-born in their countries of origin may provide a broader picture of the credit experiences.” As a result of the Federal Reserve Board’s research, advocates are now working to determine what other data points (like utility payments) should be considered in credit scoring.
Data scientists have a responsibility to be aware of “possible biases involved in the design, implementation, and use” of analytic systems.
Given the role of human subjectivity in designing algorithms, and algorithms’ widespread use, what is the responsibility of the data scientists who build them? The Association for Computing Machinery (ACM) recently published a statement on algorithmic accountability and transparency that places at least some responsibility on the designers and scientists behind the technology. The ACM outlined seven principles, the first of which says that data scientists should be aware of “the possible biases involved in [the] design, implementation, and use,” of analytic systems. But once they are aware, how can data scientists increase their chances of detecting and eliminating unintended bias?
The Center for Democracy and Technology is creating a tool to help data scientists, programmers, and product managers interrogate their instincts throughout the design process. This includes prompts to ask critical questions about the goal of the product and the methodology and assumptions used to make decisions along the way. By providing questions instead of answers, the goal of the tool is to challenge norms and standard business practices to produce an inclusive and mindful climate within any entity using algorithmic decision-making. The hope is that this product can be applied broadly, spanning the many contexts of automated decision-making technology.
A.R. Lange and Natasha Duarte are collaborators on the Center for Democracy and Technology’s (CDT) Privacy and Data Project, whose work focuses on the intersection of civil rights and big data. Lange is a former senior policy analyst and Duarte is a policy analyst at the CDT.