DATA MINING

Machine learning is a method of data analysis that aims to automatize the generation of predictive models. Through mathematic algorithms, machine learning allows computers to find subtle and hidden patterns in data, without the need of having explicitly programed the computers for finding these patterns.

Through its outstanding human capital, Quantil offers the design, implementation, and evaluation of these algorithms for exploiting a wide variety of data sources, including data bases, incremental or on-line data, images, audios, and different text sources. Usually, our implementations go together with interactive tools that allow our customers to use the models in their organizations.

Some of our implementations of classification algorithms  include a model for the prediction of the success of lawsuits against the State (ANDJE), a model for sentiment analysis of brands using social media data (Y&R), and a system for the generation of warnings of risk of failure in college courses (Universidad de Los Andes), among others.

Curva de calibración.

Correlogramas condicionales para ejercicio de alertas académicas.

Segmentation and clustering are fundamentally linked with multivariate analysis. The task of segmentation aims to assign into groups the observations in a data set, depending on natural associations among them.

Some applications of segmentation are: to group potential customers according to multiple variables in order to design marketing strategies; to classify users for the design of recommendation systems; and to understand associations among products frequently purchased together. At Quantil, we have implemented different segmentation models, such as K-means,Gaussian mixtures, and rules of association applied to different tasks, like money laundering detection, design of marketing campaigns, and language processing. Usually, our models are supplemented with a software that can be used and integrated into the day-to-day operation of our customers.

Dendograma de un clúster jerárquico.

Gráfico de los dos componentes principales y los clústers encontrados.

Text is one of the largest and most untapped sources of data in the information age. This is particularly true for non-structured text, such as the one found in social media sites, web pages, or reports in free text. Text mining is a process based on sets of algorithms that allow to automatically sort out and understand texts, to comprehend and analyze their contents, and to use the text as input for other models.

Quantil has ample expertise in applications of text mining such as sentiment analysis in data from social media (Facebook, Twitter and Blogs), automatic classification of documents into categories, topic modeling based on latent space models, email prioritizing (contact centers), and profiling of potential voters in social media (Partido de la U, a Colombian political party). Thanks to the support of our department of Technology and Information, we supplement our models with software that allows to integrate them into the customers’ operation.

At Quantil, we also do research on natural language processing (NLP), the branch of the field of artificial intelligence pursuing the understanding of human language by computers. Our researchers have worked on the creation of authors’ profiles in social media, establishing the demographic characteristics of unknown authors based on their anonymous texts, as well as on the applications of topic models in various publications for academic and political tools. Currently, they do research on the processing of medical texts, with the goal of identifying mentions of disorders in clinical documents. In 2013 and 2014, they participated in CLEF and SemEval, respectively, both very well-known international conferences on the fields of semantic analysis and retrieval of information.

When considering all variables, what data are atypical or abnormal? The detection of anomalies has a broad range of applications: from clinical diagnosis, through the detection of fraud in standardized testing, to the detection of money laundering and tax evasion. There are ample differences between cases, but some mathematical principles allow to establish methodologies common to all cases.

Quantil has expertise in implementations of anomaly detection that include the development of measurements defined with relative entropy, that allow to identify anomalous observations that seem normal for each of the variables. We have also implemented anomaly detection using clustering models and by building metrics that allow to identify outliers in any cluster. Additionally, we have expertise in models that analyze the probability distribution of digits in numerical quantities in data bases. These techniques, known as Digital Analysis, are based on mathematical empirical knowledge, such as the Benford Law or the Bebber and Scacco Law.

Some of the customers for which we have implemented anomaly detection solutions include the Colombian National Health Ministry, Fundación Valle del Lili (a private hospital in Cali, Colombia), and the Unit of Financial Intelligence and Analysis (UIAF, a dependence of the Colombian Treasury Ministry that aims to detect and prevent financial operations related to money laundering and terrorism financing).

When the data do not have an objective variable of interest clearly defined, the type of analysis that can be done is called unsupervised analysis. In this case, the objective of the analysis is to find existing patterns among the data, characterize the observations in groups, or find relationships to build networks for later analysis.

Quantil has expertise in techniques of unsupervised analysis that include automatic segmentation, rules of association, hidden Markov chains for the extraction of elements of interest from free text, and latent-space models for the reduction of dimensionality, among others. These algorithms are usually supplemented by software that allows their integration into the customer’s operation, and that offer relevant and interactive interfaces that improve the representation and analysis of the data.

Reglas de asociación

Word Embeddings

HOJA DE VIDA DEMOS