Datasets

Here is  a list of useful resources that includes open datasets and toolboxes.

 

MAESTRO Synthetic – Multiple Annotator Estimated STROng labels

MAESTRO synthetic contains 20 synthetic audio files created using Scaper, each of them 3 minutes long. The dataset was created for studying annotation procedures for strong labels using crowdsourcing. The files were segmented into 10-s long segments with a hop of 1 s, ad these 10-s segments were tagged by 5 different annotators using Amazon Mechanical Turk. The strong labels were estimated based on the tags according to the method described in the corresponding paper.

Cite as: Irene Martin Morato, Manu Harju, and Annamaria Mesaros. Crowdsourcing strong labels for sound event detection. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2021). New Paltz, NY, Oct 2021.

Download: MAESTRO Synthetic dataset

 

MACS – Multi-Annotator Captioned Soundscapes

This is a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool. Each file is annotated by multiple annotators that provided tags and a one-sentence description of the audio content. The data also includes annotator competence estimated using MACE (Multi-Annotator Competence Estimation).

Cite as:  Irene Martin-Morato, Annamaria Mesaros. Diversity and bias in audio captioning datasets, submitted to DCASE 2021 Workshop (link to be updated)

Download: MACS dataset

 

TAU-SEBin Binaural Sound Events 2021

This is a dataset of synthetic binaural audio recordings, which consist of sound events spaced in simulated shoebox rooms. The data is suitable for experiments with several acoustic scene analysis tasks such as sound source localization, sound distance estimation or sound event detection.

Data is created using isolated sound events derived from several datasets and contains 18 sound classes, namely: alarm, baby, blender, cat, crash, dishes, dog, engine, fire, footsteps, glassbreak, gunshot, knock, phone, piano, scream, speech, water. The data is split into two subsets, one of which (bin_prox_dir) contains up to two overlapping sound events, whereas the other one consists of single sources only (bin_prox_dir_one). Each subset contains 400 audio files, divided into 4 equal splits for fold-wise cross-validation.

Cite as:  Daniel Aleksander Krause, Archontis Politis, and Annamaria Mesaros. Joint direction and proximity classification of overlapping sound events from binaural audio. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2021). New Paltz, NY, Oct 2021.

Download: TAU-SEBin dataset

 

MATS – Multi-Annotator Tagged Soundscapes

This  dataset contains audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool, with multiple annotators providing labels for each file.

Tags: announcement jingle, announcement speech, adults talking, birds singing, children voices, dog barking, footsteps, music, siren, traffic noise.

Cite as: Irene Martin-Morato, Annamaria Mesaros. What is the ground truth? Reliability of multi-annotator data for audio tagging, 29th European Signal Processing Conference, EUSIPCO 2021

Download: MATS dataset

 

TAU Audio-Visual Urban Scenes 2021

This dataset contains synchronized audio and video recordings from 10 European cities in 10 different scene categories, a total of 34 hours (development set). Files of length 10 seconds. Evaluation set contains 12 cities, but the reference annotation is not public.

Categories:  airport,  shopping mall, metro station, pedestrian street, public square, street (medium level of traffic), tram (inside), bus (inside), metro (inside, underground), urban park.

Recorded in: Amsterdam, Barcelona, Helsinki, Lisbon, London, Lyon, Madrid, Milan, Prague, Paris, Stockholm and Vienna.

Cite as: Shanshan Wang, Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. A curated dataset of urban scenes for audio-visual scene analysis. In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

Download: TAU Audio-Visual Urban Scenes 2021