Blog Post12 min read•Tier 1

Unlocking AI Fairness: Why You Can't Have It All (And Why That's Okay)

Introduction: The AI Judge

Imagine a prestigious university that uses an AI system to help decide which students to admit. This AI, the "digital judge," analyzes thousands of applications—looking at academic records, test scores, and other features—to predict which applicants will be most "successful" in their program.

But what does it mean for this AI to be 'fair'? What do you think a 'fair' AI should do? Should it aim to admit an equal percentage of applicants from every demographic group? Or should it focus on having the same error rate for all groups, ensuring it doesn't disproportionately reject qualified students from one group while admitting unqualified students from another? Or perhaps its predictions should be equally reliable for everyone, meaning a predicted "90% chance of success" means the same thing no matter who you are?

This brings us to a central challenge in AI ethics: there are multiple, compelling ways to define fairness, and these definitions often conflict with each other. This document will explain the famous "impossibility theorem" of algorithmic fairness, which shows why it is mathematically impossible to satisfy all of these fairness goals at the same time, except in very rare circumstances.

First, to understand this conflict, we need to peek under the hood and see how these AI systems make decisions and how we grade their performance.

How Does an AI Classifier "See" the World?

2.1. Making Predictions

The kind of AI used in our university admissions example is often a supervised binary classifier. The concept is straightforward:

It is trained on a large set of past data, called Samples. Each sample has a "ground truth" label—for example, past applicants are labeled as either "graduated" (the positive outcome) or "did not graduate" (the negative outcome).
The classifier learns the patterns in this data.
When it sees a new applicant, it makes a Prediction, sorting them into one of two buckets: the "Positive Class" (predicted to succeed) or the "Negative Class" (predicted not to succeed).

2.2. Getting It Right vs. Getting It Wrong

No classifier is perfect. Its predictions can be correct or incorrect in four distinct ways:

True Positives (TP): The classifier correctly predicts a positive outcome. (e.g., It admits a student who goes on to graduate.)
False Positives (FP): The classifier incorrectly predicts a positive outcome. This is a Type I error. (e.g., It admits a student who does not graduate.)
True Negatives (TN): The classifier correctly predicts a negative outcome. (e.g., It rejects a student who would not have graduated.)
False Negatives (FN): The classifier incorrectly predicts a negative outcome. This is a Type II error. (e.g., It rejects a student who would have graduated.)

The real-world consequences of these errors are crucial. Consider the COMPAS algorithm, which was used to predict whether a defendant would re-offend if released on bail:

A false positive means the AI predicts someone will re-offend who actually would not. This keeps the defendant in custody unnecessarily.
A false negative means the AI predicts someone will not re-offend who actually does. This releases someone who goes on to commit another crime.

As the source material notes, these two types of errors are qualitatively very different. A simple metric like "accuracy" (the percentage of all correct decisions) can be misleading when the costs of being wrong are so different. So when is accuracy appropriate? When the false positives and false negatives have similar interpretations, and their costs are roughly equal. In high-stakes scenarios like bail decisions, this is rarely the case, which is why we need more nuanced ways to measure performance—and fairness.

Now that we understand the basic outcomes, let's see how they are used to define different concepts of fairness.

Three Different Ways to Be "Fair"

Fairness in AI isn't one single thing. Researchers have developed many mathematical definitions, often called parity measures or fairness metrics, that try to capture different ethical intuitions about what it means for an algorithm to be equitable.

Three of the most fundamental criteria are Independence, Separation, and Sufficiency. Here’s a simple breakdown of what each one demands.

Criterion Name Common Name(s) What It Means (in simple terms) Independence Demographic Parity, Statistical Parity The AI's decisions should be completely independent of a person's protected group (like race or gender). The rate of positive outcomes should be the same for all groups. Separation Equalized Odds, Equality of Opportunity The AI's error rates should be equal across different groups. For example, the True Positive Rate and False Positive Rate should be the same for men and women. This means the model should not be more likely to incorrectly flag non-reoffenders from one group (equal False Positive Rate) or fail to flag actual reoffenders from another (equal True Positive Rate). Sufficiency Predictive Parity, Calibration by Group The AI's predictions should be equally reliable for all groups. If the AI gives a "risk score" of 70%, it should mean the same thing (70% probability of the outcome) regardless of which group a person belongs to.

Each of these definitions seems reasonable on its own. Who wouldn't want an AI that gives people from all groups an equal shot, makes mistakes at the same rate for everyone, and whose predictions are equally trustworthy for all? But what happens when we try to build an AI that does all three at once?

The Impossibility Theorem: A Mathematical Traffic Jam

4.1. The Core Conflict

In 2016, researchers Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan proved a groundbreaking result that is now known as the impossibility theorem.

In simple terms, the theorem states:

For any well-calibrated classifier, it is mathematically impossible for more than one of the three fundamental fairness criteria (Independence, Separation, and Sufficiency) to be satisfied at the same time.

This means you must choose. You can have an AI that satisfies Demographic Parity, or one that satisfies Equalized Odds, or one that satisfies Predictive Parity—but you cannot have one that satisfies more than one simultaneously.

However, the theorem comes with two "escape hatches"—very specific conditions under which this conflict disappears:

Perfect Prediction: The AI is a perfect predictor and never makes a single mistake (no false positives or false negatives).
Equal Base Rates: The groups being compared have the exact same rate of positive outcomes in reality. For example, the proportion of loan applicants who would successfully repay a loan is identical across all demographic groups.

In the real world, these conditions are almost never met. AI models are not perfect, and historical data almost always reflects different outcome rates between different social groups. This is why the impossibility theorem presents such a profound challenge.

4.2. A Causal Explanation

This conflict isn't just a statistical coincidence; it's rooted in the underlying cause-and-effect relationships in the data. To understand why, we can use simple causal diagrams. Think of these diagrams as maps where circles are concepts (like Race or Graduation Status) and arrows show what influences what.

Let’s use an analogy. Imagine two separate talents, 'Athletic Talent' and 'Academic Talent,' both influence whether a student gets a 'University Scholarship.' These two talents are independent in the general population. However, if we only look at the group of students who received scholarships, we might find a strange connection: the best athletes seem to have lower grades, and the best students seem to be less athletic. Why? Because an exceptional score in one area compensated for a lower score in the other to get them into the scholarship group. By focusing only on the outcome, we created a statistical link between two originally independent things. This is called a "collider" effect, and it's key to understanding the impossibility theorem.

Let’s apply this idea to our fairness metrics:

Let A be the protected attribute (e.g., Race), like 'Athletic Talent'.
Let Ŷ be the AI's prediction (e.g., Predicted to graduate), like 'Academic Talent'.
Let Y be the true outcome (e.g., Actually graduates), which is our "scholarship" outcome.
Why Demographic Parity Conflicts with the Others
- What It Requires: Demographic Parity demands that A (Race) and Ŷ (Prediction) are independent. The causal diagram for this shows that the path between them is naturally blocked by Y (the True Outcome), which acts as that "scholarship" collider. As long as we don't look at the true outcome, Race and the AI's Prediction remain separate.
- The Conflict: The moment we try to check for another kind of fairness, like Equalized Odds, we have to look at the true outcome (Y). We have to ask, "For people who actually graduate, what was the error rate?" This is exactly like looking only at the scholarship winners in our analogy. It "unblocks" the path and creates a statistical link between A and Ŷ, destroying their independence and breaking Demographic Parity.
Why Equalized Odds and Predictive Parity Conflict with Each Other
- What They Require: The causal diagrams for Equalized Odds and Predictive Parity are different. In their structures, the path between A (Race) and Ŷ (Prediction) is not naturally blocked by a collider. They are connected from the start.
- The Conflict: Because A and Ŷ are connected, they are not independent, which immediately means Demographic Parity is impossible. Furthermore, the specific causal structures required for Equalized Odds (where you look at error rates given the true outcome) and Predictive Parity (where you look at reliability given the AI's prediction) are mutually exclusive. Satisfying one makes it structurally impossible to satisfy the other.

So if the math says perfect fairness is a myth, are we doomed to build unfair AI? Not at all. The key lies in understanding the difference between a mathematical theorem and a real-world engineering problem. Let's see how practitioners turn the "impossible" into the possible.

From "Impossible" to "Possible" in Practice

5.1. Relaxing the Rules: The Power of "Close Enough"

The impossibility theorem is powerful, but it relies on a very strict assumption: that fairness requires exact mathematical equality between groups. For example, the False Positive Rate for Group A must be exactly the same as for Group B.

But what about in the real world? Practitioners are often comfortable with approximate fairness. A small margin of error—say, a 5% difference in error rates between groups—might be perfectly acceptable depending on the context.

This insight is the core of recent research that re-examines the impossibility theorem. The key finding is powerful:

"if one allows only a small margin-of-error between metrics, there are large sets of models satisfying three fairness constraints simultaneously, even outside of perfect prediction and outcome prevalence parity."

In other words, by relaxing the rules from "perfectly equal" to "close enough," the zone of impossibility shrinks, and a new "fairness region" of possible solutions opens up.

5.2. What Does This Mean for Building Fair AI?

This research offers several practical takeaways for anyone trying to build fairer AI systems:

Small Differences Matter: The smaller the real-world difference in outcomes between groups (the "prevalence difference"), the larger the fairness region becomes, and the easier it is to achieve multiple fairness goals at once.
Better Models Can Be Fairer: Higher-performing models (e.g., those with a high Positive Predictive Value, or PPV) create a larger fairness region. This is a crucial finding because it refutes the simplistic idea that there is always a trade-off between performance and fairness. In many cases, the effort to build a more accurate and predictive model can actually make it easier to satisfy multiple fairness constraints simultaneously.
Real-World Constraints Can Help: Practical constraints, like a university only having a fixed number of spots (k) to offer, can sometimes make it easier to find fair models. This is because a smaller k forces the model to be more selective, which often increases its Positive Predictive Value (PPV). As we just learned, a higher PPV expands the fairness region, making multiple fairness goals more achievable.

The journey from a stark mathematical impossibility to a world of practical possibility shows that our definition of "fairness" is what truly matters. This brings us to the bigger picture.

Conclusion: More Than Just Math

The impossibility theorem is a critical guidepost. It warns us that we cannot naively pursue every definition of fairness at once and forces us to be deliberate about our choices. However, it is not a rigid barrier that dooms us to failure in the real world.

The challenge of fairness goes deeper than just the math. As one source points out, we need to distinguish between two types of bias:

Statistical Bias: This is when a model’s predictions do not match the world as it is in the data. For example, if a model's error rates are different for men and women, it has statistical bias.
Societal Bias: This occurs when the world as it is in the data is itself the result of unfair historical or social processes. For example, if arrest data reflects biased policing practices, a model trained on that data may be statistically "unbiased" but still perpetuate a deeply unfair societal reality.

It's important to realize that the entire discussion of Demographic Parity, Equalized Odds, and Predictive Parity has been about finding ways to measure and mitigate statistical bias. The concept of societal bias challenges us to ask a harder question: what if our data is a perfect reflection of an unfair world?

Ultimately, achieving fairness in AI is not just a technical problem to be solved with an algorithm. It requires us to ask fundamental questions about the world we live in and the world we want to create. As a response to a recent government proposal on AI bias wisely puts it:

"...by focusing directly on impacts, we can avoid unproductive technical proxy wars about what is or is not 'bias' and be sure we don’t a priori exclude consideration of non-bias-based harms."

The goal is not just to build models that are mathematically fair, but to build systems that move us toward a more just and equitable society.