Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Rui Zhao"

11.

发明授权
Speech recognition using connectionist temporal classification 有权

公开(公告)号：US10580432B2

公开(公告)日：2020-03-03

申请号：US15908115

申请日：2018-02-28

Applicant: Microsoft Technology Licensing, LLC

Inventor： Amit Das , Jinyu Li , Rui Zhao , Yifan Gong

IPC: G10L25/30 , G10L15/16 , G10L15/183 , G10L15/06 , G06F16/35 , G06F16/31

Abstract: Generally discussed herein are devices, systems, and methods for speech recognition. Processing circuitry can implement a connectionist temporal classification (CTC) neural network (NN) including an encode NN to receive an audio frame and generate a current encoded hidden feature vector, an attend NN to generate, based on a current encoded hidden feature vector and a first context vector from a previous time slice, a weight vector indicating an amount the current encoded hidden feature vector, a previous encoded hidden feature vector, and a future encoded hidden feature vector from a future time slice contribute to a current, second context vector, an annotate NN to generate the current, second context vector based on the weight vector, the current encoded hidden feature vector, the previous encoded hidden feature vector, and the future encoded hidden feature vector, and a normal NN to generate a normalized output vector based on the context vector.

12.

发明申请
SPEECH RECOGNITION USING CONNECTIONIST TEMPORAL CLASSIFICATION 审中-公开

公开(公告)号：US20190267023A1

公开(公告)日：2019-08-29

申请号：US15908115

申请日：2018-02-28

Applicant: Microsoft Technology Licensing, LLC

Inventor： Amit Das , Jinyu Li , Rui Zhao , Yifan Gong

IPC: G10L25/30 , G10L15/16 , G10L15/183 , G10L15/06 , G06F17/30

Abstract: Generally discussed herein are devices, systems, and methods for speech recognition. Processing circuitry can implement a connectionist temporal classification (CTC) neural network (NN) including an encode NN to receive an audio frame and generate a current encoded hidden feature vector, an attend NN to generate, based on a current encoded hidden feature vector and a first context vector from a previous time slice, a weight vector indicating an amount the current encoded hidden feature vector, a previous encoded hidden feature vector, and a future encoded hidden feature vector from a future time slice contribute to a current, second context vector, an annotate NN to generate the current, second context vector based on the weight vector, the current encoded hidden feature vector, the previous encoded hidden feature vector, and the future encoded hidden feature vector, and a normal NN to generate a normalized output vector based on the context vector.

13.

发明申请
DOMAIN ADAPTATION IN SPEECH RECOGNITION VIA TEACHER-STUDENT LEARNING 审中-公开

公开(公告)号：US20190051290A1

公开(公告)日：2019-02-14

申请号：US15675249

申请日：2017-08-11

Applicant: Microsoft Technology Licensing, LLC

Inventor： Jinyu Li , Michael Lewis Seltzer , Xi Wang , Rui Zhao , Yifan Gong

IPC: G10L15/16 , G06N3/08 , G10L15/06 , G10L15/183

Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.

14.

发明授权
Variable-component deep neural network for robust speech recognition 有权

公开(公告)号：US10019990B2

公开(公告)日：2018-07-10

申请号：US14414621

申请日：2014-09-09

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Jinyu Li , Rui Zhao , Yifan Gong

IPC: G10L15/20 , G10L15/16 , G10L19/24 , G10L25/84

CPC classification number: G10L15/20 , G10L15/16 , G10L19/24 , G10L25/84

Abstract: Systems and methods for speech recognition incorporating environmental variables are provided. The systems and methods capture speech to be recognized. The speech is then recognized utilizing a variable component deep neural network (DNN). The variable component DNN processes the captured speech by incorporating an environment variable. The environment variable may be any variable that is dependent on environmental conditions or the relation of the user, the client device, and the environment. For example, the environment variable may be based on noise of the environment and represented as a signal-to-noise ratio. The variable component DNN may incorporate the environment variable in different ways. For instance, the environment variable may be incorporated into weighting matrices and biases of the DNN, the outputs of the hidden layers of the DNN, or the activation functions of the nodes of the DNN.

15.

发明授权
Pre-training with alignments for recurrent neural network transducer based end-to-end speech recognition 有权

公开(公告)号：US11657799B2

公开(公告)日：2023-05-23

申请号：US16840311

申请日：2020-04-03

Applicant: Microsoft Technology Licensing, LLC

Inventor： Rui Zhao , Jinyu Li , Liang Lu , Yifan Gong , Hu Hu

IPC: G10L15/22 , G10L15/26 , G10L15/16 , G10L15/06 , G06N3/04 , G06N3/08

CPC classification number: G10L15/063 , G06N3/0445 , G06N3/08

Abstract: Techniques performed by a data processing system for training a Recurrent Neural Network Transducer (RNN-T) herein include encoder pretraining by training a neural network-based token classification model using first token-aligned training data representing a plurality of utterances, where each utterance is associated with a plurality of frames of audio data and tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames; obtaining first cross-entropy (CE) criterion from the token classification model, wherein the CE criterion represent a divergence between expected outputs and reference outputs of the model; pretraining an encoder of an RNN-T based on the first CE criterion; and training the RNN-T with second training data after pretraining the encoder of the RNN-T. These techniques also include whole-network pre-training of the RNN-T. A RNN-T pretrained using these techniques may be used to process audio data that includes spoken content to obtain a textual representation.

16.

发明授权
Generating and using text-to-speech data for speech recognition models 有权

公开(公告)号：US11587569B2

公开(公告)日：2023-02-21

申请号：US15931788

申请日：2020-05-14

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Guoli Ye , Yan Huang , Wenning Wei , Lei He , Eva Sharma , Jian Wu , Yao Tian , Edward C. Lin , Yifan Gong , Rui Zhao , Jinyu Li , William Maxwell Gale

IPC: G10L15/26 , G10L13/08 , G10L15/06 , G10L15/16

Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.

17.

发明授权
On-device custom wake word detection 有权

公开(公告)号：US11132992B2

公开(公告)日：2021-09-28

申请号：US16522416

申请日：2019-07-25

Applicant: Microsoft Technology Licensing, LLC

Inventor： Emilian Stoimenov , Rui Zhao , Kaustubh Prakash Kalgaonkar , Ivaylo Andreanov Enchev , Khuram Shahid , Anthony Phillip Stark , Guoli Ye , Mahadevan Srinivasan , Yifan Gong , Hosam Adel Khalil

IPC: G10L15/00 , G10L15/16 , G06N3/08 , G10L17/24 , G10L15/08

Abstract: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.

18.

发明申请
IDENTIFYING USER EXPECTATIONS IN QUESTION ANSWERING 审中-公开

公开(公告)号：US20190188316A1

公开(公告)日：2019-06-20

申请号：US15848929

申请日：2017-12-20

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shen Huang , Yongzheng Zhang , Chi-Yi Kuan , Hu Wang , Rui Zhao , Zhou Jin

IPC: G06F17/30 , G06K9/62 , G06F17/27 , G06Q50/00

CPC classification number: G06F16/3329 , G06F17/27 , G06F17/2715 , G06F17/2785 , G06K9/6267 , G06Q50/01

Abstract: Method and system for identifying user expectations in question answering in an on-line social network system are described. The automated support system is configured to address the technical problem of optimization of the processing of user input submitted to a computer in the form of a natural language. The automated support system uses machine learning algorithms to automatically extract, from the user input, information indicative of the user's expectations and obtain data relevant to the input based on said information indicative of the user's expectations.

19.

发明授权
Identifying entity representatives for topics reflected in content items using natural language processing 有权

公开(公告)号：US10296530B2

公开(公告)日：2019-05-21

申请号：US15252159

申请日：2016-08-30

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yi Zheng , Chi-Yi Kuan , Hu Wang , Rui Zhao , Yongzheng Zhang

IPC: G06F17/30 , G06F16/35 , G06F16/36

Abstract: A topical representative assessment system implements techniques for determining entities that are ambassadors for one or more topics. The ambassadors are determined based on content items that they have authored or content items that are otherwise attributed to them. An ambassador may be any type of entity such as a person, a company, or an organization. Machine analytics may be used to determine whether a content item corresponds to a specific topic, determine a sentiment for a content item, analyze feedback for a content item, or any combination of these.

20.

发明授权
Internal language model for E2E models 有权

公开(公告)号：US11527238B2

公开(公告)日：2022-12-13

申请号：US17154956

申请日：2021-01-21

Applicant: Microsoft Technology Licensing, LLC

Inventor： Zhong Meng , Sarangarajan Parthasarathy , Xie Sun , Yashesh Gaur , Naoyuki Kanda , Liang Lu , Xie Chen , Rui Zhao , Jinyu Li , Yifan Gong

IPC: G10L15/16 , G06N3/04 , G10L15/01 , G10L15/06 , G10L15/183

Abstract: A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification