Modelling Under-reported Spatio-temporal Events

In joint work with Jose Sebastian Ñungo, Lucas Gomez and Mateo Dulce; we introduce an under-reporting model of spatio-temporal events motivated by relevant real-world applications such as citizen security. Under-reporting of socially sensitive events can undermine the credibility of official figures and can be used strategically by official agents or the general public. Models that simultaneously estimate incidence and under-reporting rates of events can be used to improve the allocation of public resources.

The under-reporting of data is a common phenomenon in many data-related problems. For instance, under-reporting is a widely studied problem in survey sampling, where it is an important example of non-sampling errors that can introduce biases in the estimations. This problem is of particular relevance in public policy issues where government agents try to monitor geographically distributed incidents that are often under-reported. For example, sanity restaurant food inspection services, child services, pest controls, building’s compliance safety regulations, animal poaching surveillance at natural parks, crime incidents in a city, among many others. For example in year 2021, the Bogota City chamber of commerce victimization and reporting survey reported an average victimization rate of 17% and, among those, only 49% said they had reported the event to the police.

To solve this model we modify well-known combinatorial multi-armed bandit algorithms. After validating our model, we use real crime data from a large city, Bogota – Colombia, showing that the model is able to estimate the true crime and under-reporting rates.

The next figure shows monthly aggregate violent crimes as reported in the offcial statistics of the City (red line SIEDCO). The blue line shows aggregate violent crimes as reported to the emergency and security call center of the City (blue line NUSE). The Total line is our estimate of crime. It is construted from SIEDCO and NUSE as explained in the article.

This figure shows monthly aggregate violent crimes as reported in the offcial statistics of the City (red line SIEDCO). The blue line shows aggregate violent crimes as reported to the emergency and security call center of the City (blue line NUSE). The Total line is our estimate of crime. It is construted from SIEDCO and NUSE as explained in the article.

The next two pictures show how our proposed algorithm discovers the aggregate number of crimes in the city (first figure) and our estimated number of under-reporte crimes (second picture). Note that these two pictures try to discover, visiting in each period at most 10% of the area of the city, the true incidence and under-reporting rates, and they should be compared with our previos empirical estimate: Total and NUSE, of previous figure.

Convergence of the estimated total number of crimes to the observed number of crimes in the city. Three different algorithms.

Convergence of the estimated total number of under-reported crimes implied by the model. Three different algorithms.

However, note from the previous figure, that none of the algortihms converge to the true under-reporting rate after 350 iterations. The next picture further explores the nature of this convergence. The figure shows an histogram of cells (i.e., 1 km^2 regions that cover the whole city) for the distances between our estimate of true under-reporting rate (i.e., NUSE) and our best estimate after 350 iterations of the algortihm. As can bee seen, almost all cells, with CUCB algorithm, have an error of less than 0.2.

Histogram of convergence of estimated error of under-reporting rate in the last round to the empirical mean of the under-reporting rate for the whole sample. Absolute value reported.

Just for fun, the next figure ilustrates the convergence, using CUCB algorithm, of the estimated crime and under-reporting of events in the city, to the real values. The first column, second and third rows shows the heat map of the estimated crime incidence rates after 25 iterations and 100 iterations, respectively. The second column, first row shows real under-reporting as measured by NUSE dataset. The second column, second and third rows shows the heat map of the estimated under-reporting crime after 25 iterations and 100 iterations, respectively.

In a nutshell: the proposed model seems to work well for discovering the true incidence and under-reporting rates of special spatio-temporal events such as crime incidents.

Tags
Procesamiento del lenguaje natural

Newsletter

Obtén información sobre Ciencia de datos, Inteligencia Artificial, Machine Learning y más.

Artículos recientes

En los artículos de Blog, podrás conocer las últimas noticias, publicaciones, estudios y artículos de interés de la actualidad.

Tecnología

Víctimas Invisibles: la estimación del subregistro en el Conflicto Armado

El conflicto armado interno en Colombia representa una gran porción en la historia del país. La disputa por poder y control territorial entre los distintos grupos armados y las instituciones estatales ha desatado la violación de derechos humanos.

Justicia Algorítmica

Trade-off entre justicia y ajuste: un caso de estudio de crimen

El estudio de la justicia algorítmica surge en 2011 con Cynthia Dwork [1], quien se basó en el principio de igualdad de oportunidades: todas las personas, sin importar sus características, deben poder acceder a las mismas oportunidades y beneficios.

Tecnología

Evaluación De Políticas Bajo Ruido Markoviano Mediante El Algoritmo De Online Bootstrap Inference

Imagínese poder abstraer el mundo de tal forma que sea posible evaluar cuantitativamente el beneficio que se obtiene de tomar ciertas acciones a lo largo del tiempo. La buena noticia es que esto no es algo descabellado, de hecho una de las maneras de hacerlo es usando la teoría alrededor del Aprendizaje Reforzado (RL).

Tecnología

¿Quien Nada Debe, Nada Teme?

Gracias a los avances en la capacidad de cómputo; el aprendizaje automático y profundo, y la inteligencia artificial (IA), en la actualidad se vislumbran aplicaciones de la tecnología que antes parecían ciencia ficción

Economía

Diésel Y Gasolina: ¿Está El País Preparado Para Abandonar El Precio Regulado?

¿Se sorprendería si de un mes a otro la gasolina subiera $2,000 pesos por galón? Los datos financieros dirían que no. En términos simples, podemos imaginar la volatilidad como lo que consideraríamos movimientos normales.

Tecnología

Modelling Under-Reported Spatio-Temporal Crime Events *

This post is almost entirely equal to my previous post: Modelling Under-reported Spatio-temporal Events. However, following the suggestions of several referees, the emphasis is only on crime events.