Trade-off between justice and adjustment: a case study of crime

28/08/2023
Melissa Robles
Algorithmic Justice

The study of algorithmic justice emerged in 2011 with Cynthia Dwork [1], who based it on the principle of equal opportunity: all people, regardless of their characteristics, should be able to access the same opportunities and benefits. From that moment on, the study of algorithmic fairness began to gain popularity, to identify and mitigate discriminatory problems in machine learning models.

Of particular concern are the biases that may be present in crime prediction models. States that employ these models rely on the resulting predictions to allocate resources in different areas of a city. If the model is biased and, for example, predicts more crime in low-resource areas without this being reflected in reality, it could harm the residents of those areas by imposing unnecessary and excessive policing.

There are several causes for biases in predictive models, three main ones being in crime models:

Difficulty in recording all crimes in a city: even if a state has a uniformly distributed surveillance system in the city, detecting and recording all crimes that occur proves to be a complex task.
High concentration of events in certain areas: Generally, some areas of cities have a higher concentration of criminal events. This may be because there is a high crime rate in that area or because other areas lack sufficient surveillance and therefore do not report events.
Underlying model assumptions: Crime prediction models are based on specific assumptions. If these assumptions do not adequately match reality or contain inherent biases, the results may be distorted.

Metrics commonly employed to evaluate the technical performance of a model do not reveal underlying biases. A model can be 98% accurate, making only 2% errors in its predictions. However, if all of these errors occur consistently in areas inhabited by people from lower socioeconomic strata, a bias may be evident. For this reason, following the work recently presented by Cristian Pulido, Diego Alejandro Hernández, and Francisco Gomez at the Quantil Applied Mathematics seminar, it is imperative to define evaluation metrics different from those traditionally used. For this purpose, a utility function ƒ is established, which can be expressed as follows:

where set C encompasses all protected areas, i.e., areas that could be affected, such as those with low socioeconomic resources. In addition, P represents the probabilistic prediction, while Q represents the simulated crimes. Intuitively, the goal is to minimize the disparity between these two distributions, achieving a model that captures the underlying structure of the crime distributions.

From this value, some fairness metrics such as variance, min-max difference and Gini value are calculated. It is particularly important to analyze the relationship between these fairness metrics and traditional technical performance metrics. This is the objective of Cristian Alejandro Pulido, and Diego Alejandro Hernandez under the direction of Francisco Gomez [2]. For this purpose, 30 scenarios were simulated following the usual population distribution in Latin American cities, which are distributed as a Sector Model:

Three models commonly used in this field were trained from these simulations: NAIVE, KDE and SEPP. For each of these, their technical performance was analyzed by comparing the simulated real distribution with that predicted by means of the earth mover’s distance, and their possible biases based on the variance, max-min distance and Gini of the utility function.

From these experiments, it can be seen that there is a tendency for the best-fit models to have greater injustices, which can be a social problem given that the choice of a model is commonly based solely on these types of traditional metrics. Ignoring fairness metrics may be implicitly biasing and disadvantaging historically discriminated populations.

Although the simulations followed a usual population distribution in Latin America, the artificial data do not always represent reality. In particular, they do not take into account the problems of imbalance and absence of data. This is why the research will continue to demonstrate the impacts on real crime data, and additionally to analyze how underreporting can impact the fairness of the models.

References

[1] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold y Richard Zemel. Fairness Through Awareness, 2011.

[2] Cristian Pulido, Diego Alejandro Hernández y Francisco Gomez. Análisis sobre la justicia de los modelos más usuales en Seguridad Predictiva. https://www.youtube.com/watch?v=uCIanZ8jT-4

Diseños óptimos para subastas de electricidad

Esta entrada de blog está basada en mi tesis de maestría en ingeniería industrial y economía en la Universidad de los Andes, titulada Optimal Design for Electricity Auctions: A Deep learning approach.

Read article

Technology

Invisible Victims: Estimating Underreporting in the Armed Conflict

The internal armed conflict in Colombia represents a large portion of the country's history. The dispute for power and territorial control between different armed groups and state institutions has unleashed the violation of human rights.

Read article

Algorithmic Justice