Manara - Qatar Research Repository
Browse
10.1007_s00530-022-00902-0.pdf (1.19 MB)

Image and audio caps: automated captioning of background sounds and images using deep learning

Download (1.19 MB)
journal contribution
posted on 2022-11-22, 21:12 authored by M. Poongodi, Mounir Hamdi, Huihui Wang

Image recognition based on computers is something human beings have been working on for many years. It is one of the most difficult tasks in the field of computer science, and improvements to this system are made when we speak. In this paper, we propose a methodology to automatically propose an appropriate title and add a specific sound to the image. Two models have been extensively trained and combined to achieve this effect. Sounds are recommended based on the image scene and the headings are generated using a combination of natural language processing and state-of-the-art computer vision models. A Top 5 accuracy of 67% and a Top 1 accuracy of 53% have been achieved. It is also worth mentioning that this is also the first model of its kind to make this forecast.

Other Information

Published in: Multimedia Systems
License: https://creativecommons.org/licenses/by/4.0
See article on publisher's website: http://dx.doi.org/10.1007/s00530-022-00902-0

History

Language

  • English

Publisher

Springer Science and Business Media LLC

Publication Year

  • 2022

Institution affiliated with

  • Hamad Bin Khalifa University

Usage metrics

    Manara - Qatar Research Repository

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC