Blog Post10 min read•Tier 2

The Fairness Dilemma: Why "Fair AI" Isn't So Simple

Algorithms are increasingly the invisible arbiters of our lives. They help make high-stakes decisions in critical areas like hiring, healthcare, and criminal justice, often with the promise of being more objective and consistent than their human counterparts. We look to them to remove human fallibility from crucial judgments.

However, while algorithms can be powerful tools, defining and achieving "fairness" is incredibly complex. Experts define algorithmic bias as a "systematic and repeatable harmful tendency...to create 'unfair' outcomes," yet even this definition is not enough to resolve the debate. The central conflict is that there are multiple mathematical definitions of fairness that are often in direct conflict with one another.

This article will explore this dilemma through the real-world case of the COMPAS recidivism algorithm, a tool used in U.S. courts to predict the likelihood that a defendant will reoffend. Crucially, the COMPAS algorithm—like many others—is not trained on who actually commits new crimes, but on who is charged with them, a proxy variable deeply influenced by historical and systemic biases in policing. This case study powerfully illustrates a difficult truth: we can't have it all. Choosing one type of fairness often means sacrificing another, forcing us to make difficult decisions about which values we prioritize.

Let's delve into the competing ideals that make "fair AI" such a formidable challenge.

Defining "Fairness": Two Competing Ideals

Before we can build a "fair" algorithm or fix a "biased" one, we must first agree on what "fair" actually means. The public debate over the COMPAS algorithm revealed that the two main parties in the dispute—the investigative journalists at ProPublica and the software's creator, Northpointe—held fundamentally different, yet equally valid, definitions of fairness.

These two competing views get to the heart of the fairness dilemma:

Fairness as Equal Mistakes: This perspective argues that an algorithm is fair if it makes errors at the same rate for different groups. For example, if an algorithm is predicting recidivism, its false positive rate (incorrectly flagging someone as high-risk who will not reoffend) and its false negative rate (incorrectly flagging someone as low-risk who will reoffend) should be the same for both Black and White defendants.
Fairness as Equal Predictions (Calibration): This perspective argues that an algorithm is fair if its predictions have the same meaning for all groups. For example, if a defendant—regardless of their race—receives a "high-risk" score of 7, they should have the same probability of actually reoffending. This is known as calibration, and it ensures that a judge can interpret a risk score consistently for everyone.

The case of COMPAS provides a perfect real-world illustration of how these two admirable ideals can clash in practice.

The Case of COMPAS: A Tale of Two Fairnesses

2.1. The Algorithm on Trial

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a risk-assessment tool used in the U.S. court system. It analyzes a defendant's history and produces a "risk score" indicating their likelihood of committing another crime (recidivism). This score can influence critical decisions, including bail amounts and sentencing. While influential, it's important to note that the tool's overall accuracy is only moderate. Studies show it can correctly rank the recidivism risk for a random pair of defendants about 70% of the time. In 2016, a ProPublica investigation ignited a fierce debate by claiming the algorithm was biased against Black defendants.

2.2. ProPublica's Argument: Fairness as Equal Error Rates

ProPublica's investigation concluded that COMPAS was unfair because its error rates were drastically different for Black and White defendants. Their analysis focused heavily on the False Positive Rate (FPR), which measures how often an individual who will not reoffend is incorrectly classified as high-risk.

Their findings revealed a stark disparity:

Metric White Defendants Black Defendants False Positive Rate (FPR) 23.5% 44.8% False Negative Rate (FNR) 47.7% 28.0%

According to this data, a Black defendant who would not go on to commit another crime was nearly twice as likely to be mislabeled as "high risk" compared to a White defendant in the same situation. From this perspective, the algorithm is clearly biased. The consequence of a high false positive rate is significant—it could lead to harsher bail or sentencing for individuals who pose no actual threat of reoffending.

2.3. Northpointe's Defense: Fairness as Calibration

In response, Northpointe, the company behind COMPAS, defended its algorithm by arguing it was fair according to a different metric: calibration. They claimed that the risk scores meant the same thing regardless of a defendant's race.

They presented data showing that their risk scores were well-calibrated across groups. For instance: among defendants who scored a seven on the COMPAS scale, 60% of White defendants reoffended, which is nearly identical to the 61% of Black defendants who reoffended.

From a judge's perspective, this type of fairness is crucial. It means they can trust that a risk score of '7' represents a consistent probability of recidivism, whether the defendant in front of them is Black or White. According to this definition, the algorithm is treating individuals equally by providing a consistent and reliable prediction.

This brings us to a paradox: ProPublica had data proving the algorithm had biased error rates, while Northpointe had data proving its predictions were unbiased. Both were correct, yet they reached opposite conclusions about the algorithm's fairness. This isn't a simple disagreement; it's the result of a mathematical inevitability.

The Mathematical Impossibility: Why You Can't Have Both

How can an algorithm satisfy one definition of fairness while violating another? The reason these two goals—equal error rates and calibration—cannot be achieved simultaneously is that the underlying prevalence of the outcome (recidivism) is different between the two groups in the historical data used.

The data showed the following base rates for recidivism:

White Defendants: 39.4% were charged with another crime within two years.
Black Defendants: 51.4% were charged with another crime within two years.

These different base rates are not arbitrary; they reflect a complex reality where the most predictive factors for recidivism, such as a defendant's age and number of prior convictions, are themselves correlated with race due to historical patterns in policing and sentencing.

When the base rates for an outcome differ between two groups, a mathematical trade-off becomes unavoidable. In simple terms, a test that is calibrated to ensure its predictions have the same meaning for all groups will be forced to have different error rates for those groups.

This conflict can be summarized in a simple table:

If you prioritize... Then you must accept... Equal Predictions (Calibration) The group with the higher base rate (Black defendants) will have a higher False Positive Rate. Equal Error rates (FPR/FNR) The predictive meaning of a score will differ. A high-risk score for the group with the lower base rate (White defendants) will indicate less actual risk than the same score for the group with the higher base rate.

This isn't just a quirk of the COMPAS algorithm. It's a fundamental challenge for any predictive model applied to groups with different base rates, and it extends far beyond the criminal justice system.

Beyond Justice: A Pervasive Problem

The core issue in the COMPAS case—a mismatch between the goal of predicting criminality and the reality of predicting arrests—is a widespread problem known as label choice bias. This bias occurs when an algorithm is trained on a flawed proxy variable instead of the ideal target we truly care about. A powerful example of this comes from healthcare.

Ideal Target: An algorithm should identify patients with the greatest health needs to provide them with extra care.
Actual (Proxy) Target: Because "health needs" isn't a variable in datasets, the algorithm was instead trained to predict future healthcare costs.
The Bias: This seems like a reasonable proxy—sicker people cost more, right? However, due to systemic barriers in accessing care, Black patients with the same level of need often incur lower costs than White patients. The algorithm performed its task perfectly, but its task was flawed. It was like a "literal genie": it was asked to predict costs, and it did so accurately for all races. It was never told that the true goal was to identify health needs, leading to a fair process for a biased outcome. The result was that the cost-prediction algorithm systematically learned to deprioritize Black patients for extra help programs, even though they were just as sick.

This same pattern appears in other domains. For instance, Amazon developed an AI hiring tool that was trained on the company's historical hiring data. Because that data reflected past biases, the algorithm learned to penalize resumes that contained the word "women's" and downgraded applicants from all-women's colleges. It was simply repeating the patterns it was shown, without any understanding of the underlying goal of finding the best candidate.

Feedback Loops: When Bias Reinforces Itself

Beyond choosing the wrong target, bias can also emerge from a system's own operations. Feedback loops occur when an algorithm's output influences the real world in a way that generates new data, reinforcing the initial bias.

A classic example is predictive policing. If an algorithm is trained on historical arrest data showing a high number of arrests in a particular neighborhood, it may recommend deploying more police officers to that area. The increased police presence will naturally lead to more arrests, which are then fed back into the algorithm as new data. This creates a self-perpetuating cycle where the system becomes increasingly confident that the neighborhood is "high crime," not because of a change in underlying criminal activity, but because of the algorithm's own initial recommendation.

Conclusion: The Hard Work of Fairness

The most important lesson from the fairness dilemma is that there is no single, perfect, mathematical definition of fairness. Achieving fairness in AI is not a technical problem with a simple fix. It is a deeply human process that requires us to make difficult, value-laden decisions about which trade-offs are acceptable and which are not.

The solution is not to discard algorithms entirely—after all, human decision-makers are also subject to biases. Instead, we must become more intentional and transparent about how we build and deploy them. The Algorithmic Bias Playbook from the University of Chicago Booth School of Business highlights a critical first step:

Step 2A: Articulate the ideal target (what the algorithm should be predicting) vs. the actual target (what it is actually predicting).

This is not merely a theoretical exercise. When researchers applied this principle to the biased healthcare algorithm, they retrained it to predict actual health needs instead of costs. The result was a dramatic improvement in equity: the new model doubled the fraction of Black patients identified for extra-help programs, from 14% to 27%.

For anyone hoping to work with or build AI, understanding these complex trade-offs is the essential first step. It is only by grappling with the messy, human side of fairness that we can begin to build and demand more responsible and equitable systems. Ultimately, transparency is key—not as a tool for "fairwashing" a flawed system, but as a genuine commitment to articulating our values and holding our creations accountable to them.