As computer calculations are increasingly being used to steer potentially life-changing decisions, scientists are confronting complex questions about what it means to make an algorithm fair. Researchers who work with public agencies to try to build responsible and effective software, must grapple with how automated tools might introduce bias or entrench existing inequity — especially if they are being inserted into an already discriminatory social system. These tools promise to make decisions more consistent, accurate and rigorous. As large data sets and more-complex models become widespread, it is becoming harder to ignore their ethical implications. Computer scientists have no choice but to be engaged now. We can no longer just throw the algorithms over the fence and see what happens.
Researchers studying bias in algorithms say there are many ways of defining fairness, which are sometimes contradictory.
Predictive parity, equal false-positive error rates, and equal false-negative error rates are all ways of being ‘fair’, but are statistically impossible to reconcile if there are differences across two groups — such as the rates at which white and black people are being rearrested. If one wants to be fair in one way, one might necessarily be unfair in another definition that also sounds reasonable.
For example, let’s imagine that an algorithm for use in the criminal-justice system assigns scores to two groups (blue and purple) for their risk of being rearrested. Historical data indicate that the purple group has a higher rate of arrest, so the model would classify more people in the purple group as high risk. This could occur even if the model’s developers try to avoid bias by not directly telling their model whether a person is blue or purple. That is because other data used as training inputs might correlate with being blue or purple.
The algorithm’s developers try to make the prediction equitable: for both groups, ‘high risk’ corresponds to a two-thirds chance of being rearrested within two years. (This kind of fairness is termed predictive parity.) Rates of future arrests might not follow past patterns. Yet, in this simple example, assume that they do: as predicted, 3 out of 10 in the blue group and 6 out of 10 in the purple group (and two-thirds of those labelled high risk in each group) are indeed rearrested (indicated in grey bars in figure, bottom).
This algorithm has predictive parity. But there’s a problem. In the blue group, 1 person out of 7 (14%) was misidentified as high risk; in the purple group, it was 2 people out of 4 (50%). Therefore, purple individuals are more likely to be ‘false positives’: misidentified as high risk.
As long as blue and purple group members are rearrested at different rates, then it will be difficult to achieve predictive parity and equal false-positive rates. It is mathematically impossible to achieve this while also satisfying a third measure of fairness: equal false-negative rates (individuals who are identified as low risk but subsequently rearrested).
Some would see the higher false-positive rates for the purple group as discrimination. Other researchers argue that this is not necessarily clear evidence of bias in the algorithm. There could be a deeper source for the imbalance: the purple group might have been unfairly targeted for arrest in the first place. In accurately predicting from past data that more people in the purple group will be rearrested, the algorithm could be recapitulating — and perhaps entrenching — a pre-existing societal bias.
In fact, there are even more ways of defining fairness, mathematically speaking. Some researchers note that it is not clear that unequal error rates are indicative of bias. They instead reflect the fact that one group is more difficult to make predictions about than another, so this would be more or less a statistical artefact.
Although statistical imbalances are a problem, a deeper dimension of unfairness lurks within algorithms — that they might reinforce societal injustices. To give a specific example, an algorithm might purport to predict the chance of future criminal activity, but it can only rely on measurable proxies, such as being arrested. Even if we are accurately predicting something, the thing we are accurately predicting might be the imposition of injustice.
We all have a sense of what is right and what is fair, yet we often don’t have the tools or the research to tell us exactly, mechanically, how to get there. There is a large appetite for more transparency. Algorithms generally exacerbate problems when they are closed loops that are not open for algorithmic auditing, for review, or for public debate. It is not clear how best to make algorithms more open. Simply releasing all the parameters of a model won’t provide much insight into how it works. Transparency can also conflict with efforts to protect privacy. In some cases, disclosing too much information about how an algorithm works might allow people to game the system.
One big obstacle to accountability is that agencies often do not collect data on how the tools are used or their performance. A lot of times there’s no transparency because there’s nothing to share.
Some provisions — such as a right to meaningful information about the logic involved in cases of automated decision-making — seem to promote algorithmic accountability. The best way to test whether an algorithm is biased along certain lines — e.g: whether it favors one ethnicity over another — requires knowing the relevant attributes about the people who go into the system.
Meanwhile, researchers are pushing ahead on strategies for detecting bias in algorithms that haven’t been opened up for public scrutiny. Firms might be unwilling to discuss how they are working to address fairness as doing so would mean admitting that there was a problem in the first place. Even if they do, their actions might ameliorate bias but not eliminate it.
A tool might be good at predicting who will fail to appear in court. Yet, it might be better to ask why people don’t appear and, perhaps, to devise interventions, such as text reminders or transportation assistance, that might improve appearance rates. What these tools often do is help us tinker around the edges, but what we need is holistic change. The robust debate around algorithms forces us all to ask and answer these really tough fundamental questions about the systems that we all are involved in and the ways in which they operate.
There is still value in building better algorithms, even if the overarching system they are embedded in is flawed. Algorithms can’t be helicopter-dropped into these complex systems, they must be implemented with the help of people who understand the wider context. Yet, even the best efforts will face challenges, so in the absence of straight answers and perfect solutions, transparency is the best policy. If one can’t be right, one should be at least honest.