Invention Grant
- Patent Title: Systems and methods for speech transcription
-
Application No.: US14735002Application Date: 2015-06-09
-
Publication No.: US10540957B2Publication Date: 2020-01-21
- Inventor: Awni Hannun , Carl Case , Jared Casper , Bryan Catanzaro , Gregory Diamos , Erich Elsen , Ryan Prenger , Sanjeev Satheesh , Shubhabrata Sengupta , Adam Coates , Andrew Y. Ng
- Applicant: BAIDU USA LLC
- Applicant Address: US CA Sunnyvale
- Assignee: BAIDU USA LLC
- Current Assignee: BAIDU USA LLC
- Current Assignee Address: US CA Sunnyvale
- Agency: North Weber & Baugh LLP
- Main IPC: G06N3/04
- IPC: G06N3/04 ; G10L15/16 ; G06N3/08 ; G10L15/06 ; G10L15/26

Abstract:
Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.
Public/Granted literature
- US20160171974A1 SYSTEMS AND METHODS FOR SPEECH TRANSCRIPTION Public/Granted day:2016-06-16
Information query