We have published a dataset containing captions for a subset of the TAU Urban Acoustic Scenes 2018. The captioned soundscapes include airport, street and park scenes, and each file has five captions. A paper analyzing its content has been submitted to DCASE 2021 Workshop.
The dataset is called MACS – Multi-Annotator Captioned Soundscapes and it is available on zenodo.org. See the “Datasets” section for the link.