Research

Our research deals with different aspects of acoustic scene analysis, including classification of scenes and sound events, sound event detection and localization, continual learning for audio classification, data collection and annotation process, and evaluation procedures.

Projects

Teaching Machines to Listen

Teaching machines to listen

Funded by Academy of Finland, 2020-2025

This project will investigate methods for sound event detection and classification based on noisy data produced by crowdsourcing of annotations, to simulate the way humans learn about their acoustic environment. We work with methods from natural language processing, computational linguistics, and deep learning. We aim to advance state-of-the art by providing novel approaches for learning from noisy data in an efficient way.

 

Continual learning of sounds with deep neural networks

Funded by Jane and Aatos Erkko Foundation, 5.2024-4.2027

This project investigates continual learning in acoustic environments, by incrementally introducing new information to be learned. The notion of continual learning is used to indicate learning of information in successive bursts, where input data is used to extend the existing model’s knowledge by further training it. The most important aspect of incremental learning is to learn or adapt the model to the new data without forgetting the existing knowledge. When learning about acoustic environments, there are multiple ways such incremental tasks can be expressed, depending on the information available for learning. This work aims to open a few directions of research for incremental learning of audio.

 

Guided audio captioning for complex acoustic environments (GUIDE)

Acoustic scene

Funded by Jane and Aatos Erkko Foundation, 2022-2023

This project will investigate audio captioning as textual description of the most important sounds in the scene, guided towards finding the most important content while ignoring other sounds, similar to humans. We are interested to understand to what extent automatically produced audio captioning is suitable for subtitling for deaf and hard-of-hearing, if the manual work required for the descriptions of the non-speech soundtrack can be replaced by automatic methods, similar to the dialogue being translated into subtitles using automatic speech recognition.

This work is a collaboration with Assoc. Prof. Maija Hirvonen (Translation Studies, Linguistics, Tampere University).

 

Audio-visual scene analysis

Postgraduate research, funded by Tampere University 

Shanshan is working on analysis of multimodal information for scene analysis.

The way humans understand the world is not only based on what we hear, but involves all senses. Inspired by this, multi-modal analysis, especially audio-visual analysis, gained increasing popularity in machine learning. Our first study on this topic showed that joint modeling of audio and visual modalities brings significant improvement compared to the individual models for scene classification.

 

Spatial analysis of acoustic scenes

Postgraduate research

Daniel is investigating methods for utilizing spatial audio in computational acoustic scene analysis.

The project aims at exploiting spatial cues derived from binaural recordings and microphone arrays to provide complex descriptions of acoustic environments. The planned work comprises joint use of several audio tasks, including sound event detection, sound source localization and acoustic scene classification.