We have four papers in the DCASE Workshop this year!
Sound Event Classification with Object-Based Labels, by James Afolaranmi, Irene Martín-Morató and Annamaria Mesaros, which is based on James’s BSc thesis. We attempt to label audio using an object detector applied to the corresponding video. Funnily enough, it works (to some extent).
Evaluating Classification Systems Against Soft Labels with Fuzzy Precision and Recall, by Manu Harju and Annamaria Mesaros. Here, we try to work with soft labels produced by multiple annotators, to avoid binarization into event presence/absence. Since system outputs are always soft, it makes more sense to evaluate them against reference soft labels, otherwise we have to choose (arbitrary) thresholds on both reference and predictions, and lose a lot of information in the process.
Aggregate or Separate: Learning From Multi-Annotator Noisy Labels for Best Classification Performance, by Irene Martín-Morató, Paul Ahokas and Annamaria Mesaros, in which we train SED systems in different ways to take advantage of all information that was produced in the annotation process. When multiple opinions are available for the data, it is possible to train a classifier without aggregating opinions first. Pretty neat!
Incremental Learning of Acoustic Scenes and Sound Events, by Manjunath Mulimani and Annamaria Mesaros, where we learn information from audio clips in multiple steps: first we learn the acoustic scene where the clip was recorded, then we learn sound events that are active in the clips. Through independent learning of the tasks at different incremental steps, we avoid catastrophic forgetting of the information learned at the previous step.