Event News

Talk on "Automatic Video Dubbing Challenges and Solutions" by Prof. Rajiv Ratn Shah from IIIT Delhi

We are pleased to inform you about the upcoming seminar by Prof. Rajiv Ratn Shah from IIIT Delhi titled : "Automatic Video Dubbing Challenges and Solutions" Everyone interested is cordially invited to attend!


Automatic Video Dubbing Challenges and Solutions


Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Guaranteeing synchronization with respect to the alignment of video and audio subsequent to the dubbing process is one of the most challenging research problems. Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. Previous approaches have focused on aligning the number of characters and words in the source and target language texts of Machine Translation models. However, our approach aims to align the number of phonemes instead, as they are closely associated with speech duration. This work has been published at NAACL 2024. Furthermore, we propose a novel method, Dub-Wise: Multi-modal Large Language Model (LLM)-based TTS, which can control the speech duration of synthesized speech in such a way that it aligns well with the speaker's lip movements given in the reference video even when the spoken text is different or in a different language. This work has been accepted in Interspeech 2024. Furthermore, TTS necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. To this end, we introduce an end-to-end Voice Identity and Emotional Style Controllable Cross-Lingual (VECL) TTS system using multilingual speakers and an emotion embedding network. This work has also been accepted in Interspeech 2024.

Speaker Bio:

Rajiv Ratn Shah currently works as an Associate Professor in the Department of Computer Science and Engineering (jointly with Department of Human-centered Design) at IIT-Delhi. He is also the head of MIDAS Research Lab and TCS Center of Design and New Media at IIIT-Delhi. He received his Ph.D. in Computer Science from the National University of Singapore. Dr. Shah is the recipient of several awards, including the prestigious Heidelberg Laureate Forum Young Researcher fellowship and best papers at many conferences. His research interests include multimedia content processing, speech processing, natural language processing, and multimodal computing.


14:00 July 8 (Monday), 2024


Room 1902/1903, NII and Online




If you would like to join, please contact by email.
Email :planas[at]nii.ac.jp
