Invention Grant
- Patent Title: Method and system for visual context aware automatic speech recognition
-
Application No.: US18333983Application Date: 2023-06-13
-
Publication No.: US12334057B2Publication Date: 2025-06-17
- Inventor: Chayan Sarkar , Pradip Pramanick , Ruchira Singh
- Applicant: Tata Consultancy Services Limited
- Applicant Address: IN Mumbai
- Assignee: Tata Consultancy Services Limited
- Current Assignee: Tata Consultancy Services Limited
- Current Assignee Address: IN Mumbai
- Agency: Finnegan, Henderson, Farabow, Garrett & Dunner, LLP
- Priority: IN202221043394 20220728
- Main IPC: G10L15/16
- IPC: G10L15/16 ; G10L15/06

Abstract:
Accuracy of transcript is of foremost importance in Automatic Speech Recognition (ASR). State of the art system mostly rely on spelling correction based contextual improvement in ASR, which is generally a static vocabulary based biasing approach. Embodiments of the present disclosure provide a method and system for visual context aware ASR. The method provides biasing using shallow fusion biasing approach with a modified beam search decoding technique, which introduces a non-greedy pruning strategy to allow biasing at the sub-word level. The biasing algorithm brings in the visual context of the robot to the speech recognizer based on a dynamic biasing vocabulary, improving the transcription accuracy. The dynamic biasing vocabulary, comprising objects in a current environment accompanied by their self and relational attributes, is generated using a bias prediction network that explicitly adds label to objects, which are detected and captioned via a state of the art dense image captioning network.
Public/Granted literature
- US20240038224A1 METHOD AND SYSTEM FOR VISUAL CONTEXT AWARE AUTOMATIC SPEECH RECOGNITION Public/Granted day:2024-02-01
Information query