Fairness in artificial intelligence models: how to mitigate discrimination in the presence of multiple sensitive attributes?

Suppose we have a machine learning model, 𝑓, that predicts the price of an insurance bonus, Y, for data that includes a sensitive attribute, such as gender. Discrimination may occur due to statistical bias (past injustices or sample imbalance), a correlation between the sensitive attribute and some explanatory variable, or intentional bias .

To avoid this bias, there has been legislation (such as the AI ACT - Europe, 2024) that limits or even eliminates the use of certain sensitive attributes in artificial intelligence models. However, simply removing these attributes is not always the solution that generates the best level of fairness or the best model performance. There are preprocessing approaches (which modify the input data), processing approaches (which add a fairness penalty), and postprocessing approaches (which modify the univariate distribution of predictions to create an intermediate distribution, as done in Sequential Fairness)

There have been several postprocessing approaches to mitigate these effects if a model has a single sensitive attribute (Single sensitive atribute, SSA). But what can we do if there are multiple sensitive attributes (Multiple sensitive atribute, MSA)? One possible approach is to consider the intersection of the distributions created by each of the combinations of the sensitive attributes. For example, if the sensitive attributes are gender (female and male) and ethnicity (black and white), these four cases would be considered with the SSA approach:

This can be computationally expensive as the number of sensitive attributes increases. Additionally, when adding a new sensitive attribute, the previous work is lost because new distributions must be found with the new combinations. Another approach (which is the focus of this blog) is Sequential Fairness. In summary, this approach seeks to modify the model's predictions to be fair for the first sensitive attribute and then modify these new predictions again to be fair for the second attribute (and consequently also for the first), and so on. The benefits of this approach are that it is a commutative process (the order of the sequence of attributes to make the model fair does not matter), it is easy to add new sensitive attributes, and it also makes interpretability easier. 

The idea is to find a representative distribution that lies between the conditional distributions for the predictions of the sensitive attributes. This is achieved using the Wasserstein barycenter, which tries to minimize the total cost of moving one distribution to another through optimal transport. The concept of the Wasserstein barycenter extends the idea ofStrong Demographic Parity) to multiple attributes, which seeks to reduce inequity in groups and requires that a model's predictions be independent of sensitive attributes. 

It is important to note that methods for reducing unfairness in predictive models always come at a cost to performance. However, this approach, by using the Wasserstein barycenter, ensures that metrics like accuracy and MSE suffer the least possible damage. 

Equipy is a Python package that implement Sequential Fairness in continuous prediction models with multiple sensitive attributes, using the concept of the Wasserstein barycenter to minimize the impact on model performance while mitigating bias and discrimination that may arise from sensitive attributes in predictions. 

* This blog is based on the presentation made during the Quantil seminar on August 8, 2024, by Agathe Fernandes Machado titled "EquiPy: A Python package for Sequential Fairness using Optimal Transport with Applications in Insurance", where she shared insights about the research conducted by her and her team at the Université du Québec à Montréal (UQAM) to develop a Python package that implements sequential fairness to mitigate injustices in the presence of multiple sensitive attributes.
Tags
Government Artificial intelligence Technology

Newsletter

Get information about Data Science, Artificial Intelligence, Machine Learning and more.

Recent articles

In the Blog articles, you will find the latest news, publications, studies and articles of current interest.

Algorithmic Justice

Fairness in artificial intelligence models: how to mitigate discrimination in the presence of multiple sensitive attributes?

Let's suppose we have a machine learning model, 𝑓, that predicts the price of an insurance premium, Y, based on data that includes a sensitive attribute, such as gender. Discrimination may occur due to a statistical bias...

Technology

Translation models for the preservation of indigenous languages in Colombia

According to the National Indigenous Organization of Colombia (ONIC) there are 69 languages spoken in Colombian territory, 65 of which are indigenous languages. This makes Colombia the third most linguistically diverse country in Latin America, after Brazil and Mexico, with a notable concentration in the Amazon and Vaupés...

Economía

Optimal designs for electricity auctions

This blog entry is based on my master's thesis in Industrial Engineering and Economics at the Universidad de los Andes, titled "Optimal Design for Electricity Auctions: A Deep Learning Approach."

Technology

Invisible Victims: Estimating Underreporting in the Armed Conflict

The internal armed conflict in Colombia represents a large portion of the country's history. The dispute for power and territorial control between different armed groups and state institutions has unleashed the violation of human rights.

Algorithmic Justice

Trade-off between justice and adjustment: a case study of crime

The study of algorithmic justice emerged in 2011 with Cynthia Dwork [1], who based it on the principle of equal opportunity: all people, regardless of their characteristics, should be able to access the same opportunities and benefits.

Technology

Policy Evaluation Under Markovian Noise Using The Online Bootstrap Inference Algorithm

Imagine being able to abstract the world in such a way that it is possible to quantitatively evaluate the benefit of taking certain actions over time. The good news is that this is not far-fetched, in fact one of the ways to do it is by using the theory around Reinforcement Learning (RL).