Explainer9 min read•Tier 3

Why Making AI Fair Is Harder—And More Hopeful—Than You Think

Introduction: The Quest for Fair Algorithms

Artificial intelligence is increasingly making critical decisions that shape our lives. Algorithms help determine who gets a loan, who gets hired, and who is considered a high risk in the criminal justice system. As these systems become more powerful, there is a universal and urgent desire for them to be fair and unbiased.

But what does “fair” actually mean? As it turns out, our intuitive understanding of fairness often collides with mathematical and social realities, leading to surprising and complex challenges. This article explores five of the most impactful truths about AI fairness, revealing why the quest for truly equitable algorithms is more complicated—and ultimately, more achievable—than most people realize.

The Fairness Paradox: Why You Can’t Have It All

One of the most profound challenges in AI fairness is a mathematical paradox: for many real-world problems, it's impossible for an algorithm to satisfy two of our most basic fairness requirements at the same time. This holds true unless the groups being compared have identical underlying outcomes, a condition that rarely exists in society.

The two conflicting goals can be defined in simple terms:

Equal Meaning: A prediction should mean the same thing regardless of a person's group. For example, if an algorithm assigns a "high-risk" score to two different people, their actual probability of reoffending should be roughly the same. This is also known as predictive value parity.
Equal Mistakes: The algorithm should make errors at the same rate for different groups. For instance, it shouldn't be more likely to wrongly flag someone as high-risk just because of their group identity. This is also known as error rate parity.

Both of these goals seem essential for a fair system. The problem is, you can't have both if the base rates of the outcome (e.g., the actual rate of reoffending) are different between groups.

Unless two groups commit crimes at precisely the same rate, any classification that is equally predictive for both groups will necessarily make different kinds of errors between the groups. And if we calibrate it to make the same kind of errors, the meaning of the predictions will be different.

The COMPAS recidivism algorithm, a tool used in the U.S. criminal justice system, provides a stark real-world example of this paradox:

When comparing Black and White defendants, the algorithm achieved roughly equal meaning. A high-risk score corresponded to a similar probability of reoffending for both groups. However, it had unequal mistakes: Black defendants were falsely flagged as high-risk almost twice as often as White defendants.
When comparing male and female defendants, the algorithm achieved roughly equal mistakes. But it had unequal meaning: a high-risk score for men indicated a 64% chance of reoffending, while for women, it indicated only a 52% chance.

This is a critical insight because it proves that "fairness" is not a single technical property to be optimized. Instead, it is a complex negotiation of trade-offs, forcing us to decide which kind of fairness we value most in a given context.

The Deception of "Accuracy": How Base Rates Fool Us All

The statistical trap that explains the paradox above is known as the base rate fallacy. This cognitive blind spot affects everyone—including experts—and makes it incredibly difficult to intuitively grasp how "accurate" an algorithm truly is.

A classic medical example illustrates this perfectly:

Imagine a disease with a 0.1% prevalence in the population (or 1 in 1000 people). This is the base rate. A test for this disease has a 5% false positive rate.
Someone tests positive. What is the chance they actually have the disease?

A study found that the most common answer given by a group of doctors was 95%. The correct answer is about 2%.

The reason the probability is so low is that in a population with a low base rate of disease, the small percentage of false positives from the large healthy population easily outnumbers the true positives from the small infected population.

This takeaway is profoundly important. It shows that our intuition about probability is often wrong and that a system can appear highly "accurate" under specific conditions but perform poorly in the real world. For example, studies on drugged driving detection that used a very high base rate of impaired subjects reported impressive accuracy rates (e.g., 94%). However, when applied to the general population of drivers where the base rate is much lower, the predictive value of the test would be dramatically worse, with a high chance of wrongly accusing innocent people.

The Iceberg of Bias: It’s Not the Algorithm, It’s Us

While statistical paradoxes are a major hurdle, they are only the tip of the iceberg. The most significant sources of bias in AI are not computational, but human and societal.

According to a report from the National Institute of Standards and Technology (NIST), we can visualize AI bias as an iceberg. The visible tip is composed of statistical and computational biases—the issues we often focus on. But the much larger, hidden mass is composed of human biases and systemic biases.

Current attempts for addressing the harmful effects of AI bias remain focused on computational factors such as representativeness of datasets and fairness of machine learning algorithms. ... Yet, as illustrated in Fig. 1, human and systemic institutional and societal factors are significant sources of AI bias as well, and are currently overlooked.

Let's break down what these hidden biases are:

Human Bias: These are the cognitive shortcuts, unconscious assumptions, and limited viewpoints of the teams who design, build, and deploy AI systems. Decisions about which data to use, what to measure, and how to define success are all shaped by human perspectives.
Systemic Bias: These are the historical and institutional patterns of discrimination that are already baked into the data we use to train AI. An algorithm trained on historical hiring data, for example, will learn and perpetuate any existing societal biases present in that data.

This means that simply "cleaning the data" or tweaking an algorithm is not enough. To truly address AI bias, we must examine the human decisions and societal context that surround the technology from its inception to its deployment.

The Accuracy Trap: Why the "Best" Model Can Be the Most Harmful

In the world of technology, the relentless pursuit of optimizing a single performance metric—usually accuracy—is standard practice. However, when it comes to fairness, this approach can be dangerous. As the NIST report provocatively states, "The most accurate model is not necessarily the one with the least harmful impact."

This is because a model optimized solely for predictive accuracy on biased data will inevitably learn and amplify the existing inequities within that data. It's a direct consequence of the base rate fallacy we just explored: optimizing for overall accuracy on data with different base rates can lead to models that cause discriminatory harm. Because of the inherent trade-offs in fairness metrics and the systemic biases embedded in our datasets, pushing for maximum accuracy can directly lead to the most discriminatory outcomes.

Research from a causal perspective confirms this tension. One study demonstrated that imposing causal fairness constraints to reduce discrimination almost always reduces the model's predictive power, resulting in what the authors call an "excess loss." This reveals a fundamental trade-off between pure accuracy and fairness.

This is a critical lesson for both developers and the public. It challenges the common assumption that better technical performance automatically leads to better societal outcomes. In AI, the "best" model is not just the most accurate one; it's the one that successfully navigates the complex trade-offs between performance and harm.

Hope in Imperfection: Why "Good Enough" Fairness Is Possible

After confronting mathematical impossibilities and the deep roots of societal bias, it's easy to feel that achieving AI fairness is a hopeless task. But a final, crucial truth offers a more optimistic path forward.

While achieving perfect fairness across all metrics is mathematically impossible, research shows that achieving practical and approximate fairness is often very possible.

The key insight is this: the impossibility theorem only holds under conditions of perfect mathematical equality. When we allow for a small margin of error (e.g., accepting a 5% difference in error rates between groups) and when the real-world difference in outcomes between groups is moderate (less than a 20% difference in base rates), a large number of models suddenly become available that can satisfy multiple fairness criteria simultaneously.

This finding is transformative. It moves the conversation from a paralyzing theoretical impossibility to a practical, real-world negotiation of acceptable trade-offs. The goal shifts from seeking a single, mathematically perfect solution to finding robustly "good enough" solutions that align with our societal values. It acknowledges that while perfection is out of reach, meaningful progress is not.

Conclusion: Navigating the Trade-Offs

The journey into AI fairness takes us from the stark mathematical reality of inescapable trade-offs, down to the hidden iceberg of human and systemic bias, and finally to the hopeful conclusion that practical, approximate fairness is an achievable goal.

The most important takeaway is that AI fairness is not a technical problem with a single correct answer. It is a socio-technical challenge. It demands that we move beyond a purely computational mindset and engage in deliberate, context-aware conversations about our values and priorities. Every algorithm embodies a choice about which errors we are willing to tolerate and whose well-being we prioritize.

This leaves us with a critical question, not for engineers, but for all of us: Knowing that every algorithm embodies a choice about which errors we tolerate and for whom, how do we, as a society, decide which trade-offs are worth making?