Risk assessment tools in the criminal justice system: inaccurate, unfair, and unjust?

Earlier this week I heard a young computer scientist named Julia Dressel talk about her work examining the accuracy and fairness of the COMPAS tool, a proprietary system courts across the country are using in bail, sentencing, and parole contexts. This revolution is happening despite the fact that we don’t know very much about these tools or how they work, often because the algorithms that animate them are corporate secrets.

Despite these limitations, researchers and journalists have examined whether the systems produce fair and accurate results. In 2016, for example, ProPublica reporter Julia Angwin and her colleagues famously found that the COMPAS system used by Broward County, Florida produced high levels of racially-tinged false positives and negatives, assigning higher risk scores to black people who did not recidivate, while assigning lower risk scores to whites who did. In other words, the algorithm was racist.

Dressel and a colleague took Angwin’s research one step further, and found that non-expert human beings were just as good at predicting recidivism as the COMPAS technology. Hers are important findings, because they show us that substituting machine-generated decision making for human choice in our legal system isn’t just dubious from an ethical perspective, but doesn’t necessarily produce more accurate outcomes.

You can read Dressel’s study online, where she has made it available under a creative commons attribution non-commercial license. (Note that she did this work while an undergraduate at Dartmouth; impressive, to say the least.)

Two things struck me while I listened to Dressel talk about her work—things that are relevant beyond the confines of her study, and if we are cognizant of them, should help us better integrate risk assessment tools and algorithmic decision making into our lives in a healthy, ethical, and just manner.

First, Dressel emphasized that she probably wouldn’t have done the study at all had she not double majored in computer science and gender studies. The former taught her the technical skills she needed to interrogate the COMPAS system, and to reproduce its recommendations using human research participants. But importantly, she said, she probably wouldn’t have thought to interrogate the system at all had she not gotten a well rounded liberal arts education, where she presumably learned to ask not just how we can do something, but also whether we should do it, if so, why, and if we do, what the likely outcomes might be for different types of people.

Dressel said that she was moved to do this research because she kept seeing people talking about algorithmic decision making tools as if they were obviously an improvement over ordinary human decision making, even though there were no studies to prove this conclusion. Her liberal arts education helped her see this fatal flaw in the public discourse around the use of risk assessment tools, while her computer science education helped her fill in the void in the technical research.

Contrast Dressel’s self awareness to the Harvard computer scientist who developed a tool to help law enforcement automatically classify crimes as gang related, and recently told a skeptic he had no responsibility to think about how his technology might be used because he’s “just an engineer.” Dressel did not need someone to tell her to think about how technology and society interact. Her liberal arts education gave her a solid intellectual foundation, and so she instinctively asks these questions. We need more computer scientists and engineers like her, and to get them, we need academic computer science and engineering programs to take the social sciences seriously.

Second, I couldn’t help but notice that during Dressel’s presentation, she discussed accuracy and fairness, but not justice. That’s an important omission, as courts and legislators nationwide look to adopt risk assessment tools in the criminal legal system. After all, even if the COMPAS tool predicted recidivism risk with 100 percent accuracy, would it be just to use it to keep someone in confinement? Put another way, are we comfortable outsourcing the responsibility for depriving someone of their freedom to a machine? In the bail context, the question is just as important: Do we think justice is served when an algorithm determines whether someone should await trial at home, or be locked in a cage until they have their day in court? This real-life scenario brings us very close to the pre-crime system depicted in the film MINORITY REPORT. That film, like the book on which it is based, is a dystopia—not a road map.

Whether we should use these tools at all is an important threshold question, because judges have an incentive to shrug off responsibility for making critical decisions like whether to hold someone on bail or let them go home. After all, no judge wants to be the person who let someone out of detention only to see that person arrested for murder. In a perverse way, the understandable human fear of a political backlash could lead to the conclusion that risk assessment tools, if made more accurate and less racist, are a better alternative than allowing judges to make decisions from the lizard parts of their brains. But succumbing to a toxic political environment, and building an architecture of techno-legal repression around it, takes us in a dangerous direction, in which no human being can be held responsible for depriving another human being of their freedom. Instead of reacting to a political problem—fear and the impulse to incarcerate it produces—by implementing a technological solution, wouldn’t it be better to work to change the political culture that rewards incarceration and punishes leniency? That’s longer term work, but it’s arguably guaranteed to produce better outcomes.

Equally important, the availability of risk assessment tools in the bail, sentencing, and parole contexts may have the very dangerous impact of deferring substantive reforms that would more radically reshape our existing system. For example, why eliminate cash bail, risking political fallout should someone get hurt as a result, when a legislature could instead mandate the use of risk assessment tools, and in so doing make it look like they are responding to advocates and reforming the system? Why decriminalize drug possession or shift funding from police and prisons to public health and housing, when it’s much cheaper and easier to leave the system basically intact, and inject some 21st century sounding technology into it? Why risk alienating powerful police and prison unions, when legislators can make it appear as if they are open to systemic reform, while they tinker around the edges, and potentially entrench rather than upend existing reactionary dynamics?

We must be careful what we build, and always, when the subject is the deprivation of human freedom, think about not just accuracy and fairness, but center justice as our guiding star. Doing so will require big picture social, political, economic, legal, and institutional change, not the reproduction of existing systems through agency-deferring technologies like risk assessment instruments.