Blog Post8 min read•Tier 3

Measuring Fairness in AI: A Beginner's Guide

From module:M13—Fairness-Accuracy Tradeoffs

Introduction: Why Do We Need to Measure Fairness?

In 2016, a groundbreaking ProPublica investigation revealed that an algorithm used across the United States to predict the likelihood of a criminal reoffending was biased against Black individuals. The system, called COMPAS, was found to be particularly unreliable for Black defendants. A follow-up analysis revealed that Black defendants who did not go on to reoffend were almost twice as likely as their white counterparts (45% vs. 23%) to be incorrectly labeled as "high risk." This case study highlights a critical challenge at the heart of modern artificial intelligence: the risk of algorithmic bias.

Algorithmic bias refers to systematic errors in a machine learning model that produce unfair outcomes for different social groups. This often happens because a model learns from and amplifies historical inequities present in its training data.

Unfairness can manifest in two primary ways:

Disparate Treatment: This is intentional discrimination. It occurs when members of a protected group are deliberately treated differently.
Disparate Impact: This is unintentional discrimination. It happens when a policy or practice that appears neutral on the surface disproportionately harms a protected group. A landmark study from MIT provided a stark example: a facial recognition system correctly identified the gender of white males 99% of the time, but its accuracy dropped to just 65% for darker-skinned females. This was a result of measurement distortions and an imbalanced training dataset that underrepresented this demographic, leading the algorithm to "see" incorrect or noisy labels as truth.

This can even happen when protected characteristics like race or gender are removed from the data. A model might use a proxy variable—a seemingly neutral feature like a ZIP code that is highly correlated with a protected characteristic like race or socioeconomic status—to make discriminatory predictions.

This raises a difficult question: If we want to build fairer systems, how do we mathematically define and measure what "fair" even means?

1.0 The Three Pillars of Group Fairness

To fix algorithmic bias, we first need to define what "fair" means in mathematical terms. However, there is no single, universally accepted definition. Instead, data scientists have developed several criteria for "group fairness," which evaluate whether a model's outcomes are equitable across different demographic groups. We can organize these definitions around three fundamental questions, which we'll call the "Pillars of Group Fairness." It's important to note that these criteria can sometimes conflict with one another, forcing us to make important choices about what kind of fairness we want to prioritize.

Fairness Criterion Core Question Common Name(s) Independence Are the outcomes distributed equally across groups? Statistical Parity Separation Is the model equally accurate for all groups? Equalized Odds Sufficiency Do the model's predictions mean the same thing for everyone? Predictive Parity

1.1 Independence (Statistical Parity)

Independence means that the algorithm's outcomes are distributed in equal proportions for each group, regardless of their actual outcomes. The protected characteristic (e.g., race) and the model's prediction should be independent.

This is the most straightforward way to check for equal outcomes. For example, in a hiring context, Independence is satisfied if a company's algorithm selects 30% of male applicants to interview and also selects roughly 30% of female applicants to interview. The "pass rate" is the same for both groups.

However, this criterion can be undesirable in situations where a demographic is legitimately associated with risk factors. It focuses only on equalizing outcomes and ignores whether the model's predictions are correct.

1.2 Separation (Equalized Odds)

Separation means that the model's predictions are independent of the protected group, given the true outcome. In practice, this means the model has equal accuracy rates—specifically, equal false positive and false negative rates—across different groups.

This criterion ensures that a model makes mistakes at the same rate for everyone, which is a more nuanced view of fairness than simply looking at outcomes.

Imagine a spam filter. Separation would mean that the rate at which it mistakenly sends important emails to the spam folder (false positives) is the same for emails coming from Group A and Group B. Likewise, the rate at which it mistakenly leaves actual spam in the inbox (false negatives) would also have to be the same for both groups. This criterion is also known as Equalized Odds.

1.3 Sufficiency (Predictive Parity)

Sufficiency means that the true outcome is independent of the protected group, given the model's prediction or risk score. This means that for any given prediction (e.g., "high-risk"), the actual rate of positive outcomes should be the same for all groups.

This criterion ensures that a model's predictions have the same meaning for everyone.

For instance, if a loan-risk algorithm gives a score of "750," sufficiency demands that applicants from all racial groups who receive that score have the same actual loan repayment rate. The score means the same thing, no matter who you are. This concept is also referred to as calibration or Predictive Parity.

These theoretical definitions are essential, but how do we apply them to test a real-world system?

2.0 Putting Fairness to the Test: The Four-Fifths Rule

One of the most practical guidelines for measuring fairness comes from a simple statistical test established by the U.S. Equal Employment Opportunity Commission (EEOC).

The Four-Fifths (or 80%) Rule states that the selection rate for any protected group should be at least 80% of the selection rate for the group with the highest rate. If it falls below this threshold, it is considered evidence of a potential disparate impact.

Let's walk through a simple hiring example:

Scenario: A company receives applications from 10 men and 6 women.
Action: The company's algorithm selects 3 men (a 30% "pass" rate) and 1 woman (a 16.7% "pass" rate) for an interview.
Calculation: Divide the selection rate of the lower-selected group by the rate of the higher-selected group: 16.7% ÷ 30.0% = 55.6%
Conclusion: Because 55.6% is less than 80%, this hiring practice flags a potential disparate impact against women and requires further investigation.

This rule provides a clear, actionable starting point, but it also brings up a more complex debate about the relationship between fairness and a model's overall performance.

3.0 The "Fairness vs. Accuracy" Debate: Is There Really a Trade-off?

A common assumption in AI is the existence of a fairness-accuracy trade-off. The idea is that increasing a model's fairness must necessarily decrease its accuracy, and vice versa. It's often compared to a company with a fixed amount of steel that must decide whether to build more cars or more airplanes; it can't maximize both simultaneously.

However, recent research challenges this assumption, suggesting the trade-off is often "negligible in practice" or even a "false notion" when accuracy is measured on biased historical data.

The core problem is label bias. A model's "accuracy" is measured by how well its predictions match the "correct" labels in the training data. But what if those labels are themselves the product of past discrimination? For example, if historical loan data reflects a pattern of denying loans to qualified Black applicants, a model trained on that data will learn to reproduce this discriminatory pattern. Its "accuracy" is simply a measure of how well it mimics past unfairness. This means the fairness-accuracy trade-off is often an illusion. It isn't positioning 'fairness' against 'accuracy'; it's positioning 'fairness' against 'past unfairness,' a comparison that is fundamentally flawed.

The crucial insight is this: when a model's "accuracy" is measured against a biased history, enforcing fairness isn't a compromise—it's a correction. This reframes fairness not as a cost, but as a prerequisite for building a genuinely accurate model. Once we can identify and measure unfairness, we can develop strategies to fix it.

4.0 How Do We Make AI Fairer? An Overview of Strategies

Since bias can enter at different stages—in the data, during training, or in the final output—the strategies for mitigation mirror these stages. Data scientists have developed a toolkit of methods to mitigate bias in machine learning models, which fall into three main categories.

Pre-processing: Modifying the training data before the model is built to remove or reduce underlying biases.
In-processing: Incorporating fairness constraints directly into the model's training process and mathematical objective (loss) function.
Post-processing: Adjusting the model's predictions after it has been trained, without changing the underlying model itself.

5.0 Conclusion: Your Key Takeaways

Understanding and measuring fairness is a foundational step toward building more responsible AI. As you continue your learning journey, keep these critical points in mind.

Fairness Has Many Definitions: There isn't one single way to measure fairness. The three main criteria—Independence, Separation, and Sufficiency—answer different questions about what it means for a model to be equitable.
The Fairness-Accuracy Trade-off is Not a Given: The idea that you must sacrifice accuracy for fairness is often an illusion caused by biased data. A truly fair model may also be a more accurate one.
Mitigation is Possible: AI bias is not an unsolvable problem. Practitioners can use a variety of pre-processing, in-processing, and post-processing techniques to build fairer systems.

Asking critical questions about how fairness is defined, measured, and implemented is essential for creating AI that serves everyone justly and effectively.