Seminars

Research and development - Seminars

Diffusion Model for Anura Call Generation with FID Selection.:

This study, led by José, presents an innovative approach for generating frog call audio recordings in Colombia using diffusion models—an emerging technique in generative intelligence. These models are particularly valuable in contexts with limited data, such as the bioacoustics of specific and hard-to-record species. Aimed at supporting biodiversity monitoring and conservation, the team applied generative diffusion models to create synthetic audio samples of frog calls. The diffusion methodology involves a forward process that incrementally and controllably adds noise to the original data, transforming it into pure noise, followed by a reverse process that gradually removes the noise to regenerate plausible data. Starting from an initial dataset of 2.5 hours of annotated recordings, the team applied preprocessing and normalization procedures to obtain audio with specific acoustic features that experts could evaluate via spectrograms. Neural networks were trained to enhance the quality of the generated audio. To objectively evaluate the synthetic samples, a trained neural network applied a Fréchet Inception Distance (FID) metric, complemented by human assessments. Results showed that participants had difficulty distinguishing between real and generated frog calls, validating the diffusion model’s accuracy and suggesting its potential to generate synthetic data for environmental applications, thereby expanding acoustic monitoring of hard-to-access species.

Details:

Exhibitor:

Jose Sebastián Ñungo

Date:

August 22, 2024

Play Video