Justice in Artificial Intelligence Models: A New Perspective Based on Algorithm Redesign

02/04/2025
Catalina Bernal, Melissa Robles y Mateo Dulce
Algorithmic Auditing, Justice

In recent years, artificial intelligence models have demonstrated incredible potential to transform industries, from healthcare to finance. However, they have also revealed a troubling issue: algorithmic bias. This phenomenon occurs when AI models reflect and amplify prejudices present in the data they were trained on, which can lead to unfair or discriminatory decisions.

Regulations in various countries have sought to impose restrictions to mitigate the discriminatory effects that the use of artificial intelligence models could have. However, Emily Black, Assistant Professor of Computer Science and Data Science at NYU, points out in her research Algorithmic Fairness and Vertical Equity: Income Fairness in Tax Audits that the regulatory approach based on legal constraints can create tensions with previously successful mitigation strategies. In this regard, Emily addresses three important topics:

Legal tensions that arise in the U.S. due to anti-discrimination
How to navigate that tension in the context of algorithmic systems for tax auditing.
How to strengthen the law to fight discrimination: model multiplicity [2]

EN-ES-EN Traductor dijo: The risk of bias in artificial intelligence models is not a recent phenomenon. For many years, efforts have been made to prevent biases through constraints in the training of machine learning models. Just as models are designed to be accurate, it is also possible to restrict their structure to ensure fairness.

Regulation in the United States, specifically related to discrimination, is outlined in The Fair Housing Act (access to housing), Title VII of the Civil Rights Act (access to employment), and The Equal Credit Opportunity Act (access to credit). This regulation is divided into two parts. The first is Disparate Treatment which prohibits intentional discrimination. This means that using demographic information to make decisions in the three mentioned areas—housing, employment, and credit—is illegal. The second part is Disparate Impact , which states that even if it is unclear whether the model is engaging in discrimination, it may still be illegal to have a system involved in decision-making processes that produces differential impacts across demographic groups.

This is where the first tension arises: how is it possible to mitigate the impact that exists between different demographic groups if it is not legal to consider demographic groups when making decisions?

Figure 1. Tension between regulatory definitions in the U.S. Taken and adapted from Emily Black’s seminar (March, 2025)

This tension becomes even deeper with the use of artificial intelligence systems, because the interpretation of the law prohibits the explicit use of demographic information as a model parameter, as well as its inclusion as a constraint during training. Similarly, these restrictions also apply to data that may be highly correlated with demographic characteristics, such as an individual’s place of residence. However, academic literature suggests that the most effective strategy for mitigating bias involves imposing constraints on models using demographic information—yet the law prohibits it. So, how is it possible to pursue solutions against discrimination?

EN-ES-EN Traductor dijo: Emily suggests that the mitigation strategy must go beyond abstracting input and output data; attention must be paid to all algorithm design decisions being made. This includes how the problem is formulated, the collection and preprocessing of data, and the type of model to be used. This is known as the Pipeline Aware Approach to AI Fairness (an approach to fairness through the design of the pipeline).

Change in design decisions: IRS – selection of tax audits

This practical example is based on data from the United States Internal Revenue Service (IRS). The project aimed to improve the selection of individuals for tax audits by using more advanced artificial intelligence models than the linear models the IRS had been using for years. The challenge was not only to improve the model’s accuracy but also to address the unfairness in audit selection, which disproportionately affected low-income individuals.

The IRS provided two main datasets for this study:

Random Sample Data (NRP): a random selection of audited taxpayers used to understand the distribution of tax fraud.
Operational Data: audit decisions made by the IRS based on perceived risk.

When analyzing the data, Emily and her team discovered that low-income individuals were audited more frequently, while middle- and high-income individuals were audited less often. However, upon examining the tax fraud data, they found that higher-income individuals had a greater likelihood of committing fraud, indicating that the current system was unfair.

Figure 2. On the left, audit rate by income decile. On the right, the rate of tax underreporting as a percentage of the population. Taken from [1].

Improving the model without demographic constraints

The first attempt at improvement consisted of replacing the linear model with a more complex one, such as a Random Forest, without changing the problem formulation. However, this did not solve the issue of discrimination, as the model continued to primarily select middle- and low-income individuals for audits, while ignoring high-income taxpayers.

The next step was to apply traditional AI fairness methods, such as imposing constraints on the model to ensure that all income deciles had the same likelihood of being selected for an audit. Although this approach slightly improved fairness, it also significantly reduced the amount of money recovered by the IRS and increased the false positive rate.

Changes in algorithm design

The most effective solution emerged from changing the problem formulation. Instead of predicting whether a person would commit fraud (a classification problem), the team decided to predict how much money would be at stake in the event of fraud (a regression problem). This change in the algorithm's design allowed the model to prioritize higher-income individuals, who were more likely to commit significant fraud, while also reducing discrimination against low-income individuals.

With this approach, Emily incorporated the principle of vertical equity, in which she proposed that taxpayers with greater economic capacity (vertical equity) should be treated differently—being audited with higher priority due to the potentially greater impact of their actions on tax revenue. At the same time, it avoids placing a disproportionate burden on low-income taxpayers, contributing to a fairer and more equitable system.

This approach not only improved the fairness of the model but also increased the amount of money recovered by the IRS—from $3 billion to over $10 billion. Although there was an increase in the false positive rate, the IRS considered that the financial benefit justified this trade-off.

Predictive multiplicity and its relationship with fairness

Emily also introduced the concept of “predictive multiplicity,” a phenomenon in which multiple models with the same predictive accuracy can make different predictions for the same individuals. This phenomenon is especially relevant in complex models, such as deep learning models, where variance is higher. While multiplicity might seem problematic, it also offers an opportunity to select models that are not only accurate but also fairer.

In the legal context, Emily argued that companies cannot justify discrimination in their models by claiming it is necessary to maintain accuracy. If there are other models with the same predictive performance but less discrimination, companies are obligated to use those models. This approach reinforces the idea that fairness must be a key consideration in the design of artificial intelligence algorithms.

Conclusión

Emily Black’s work highlights the importance of addressing discrimination in artificial intelligence models through changes in algorithm design, rather than relying solely on training constraints. By reformulating the problem and considering predictive multiplicity, it is possible to create models that are not only accurate but also fairer. This approach has not only practical implications—as in the case of the IRS—but also legal ones, reinforcing the need for companies to prioritize fairness in their AI systems.

In summary, fairness in artificial intelligence is not just a matter of tweaking parameters, but of rethinking how models are designed and selected to ensure they are both equitable and effective.

Neural Networks for Optimization in Treasury Auctions

¿Cuál formato de subasta, el uniforme o el discriminatorio, resulta más adecuado para reducir el costo de financiamiento del Estado?…

Read article

AI Governance

Beyond Automation: Why We Need New Metrics to Understand the Future of Work with AI

In recent years, the conversation about artificial intelligence and employment has been dominated by a substitution narrative: Which jobs will disappear? How many jobs will be replaced by algorithms? While this question is important, it has led us to view the future of work from a narrow perspective…

Read article

IA

AI for the Common Good: Capabilities, Power, and Participation

How should we understand the concept of developing Artificial Intelligence for the common good? This is a key question, which, according to philosopher Diana Acosta Navas, opens up two central dimensions: one philosophical and the other political…

Read article

IA

SESGO: A Critical Look at AI Biases in Spanish

In recent years, language models have transformed the way we interact with information. From virtual assistants to decision-support systems, these tools have become omnipresent…

Read article

Algorithmic Justice