-
公开(公告)号:US10580432B2
公开(公告)日:2020-03-03
申请号:US15908115
申请日:2018-02-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Amit Das , Jinyu Li , Rui Zhao , Yifan Gong
Abstract: Generally discussed herein are devices, systems, and methods for speech recognition. Processing circuitry can implement a connectionist temporal classification (CTC) neural network (NN) including an encode NN to receive an audio frame and generate a current encoded hidden feature vector, an attend NN to generate, based on a current encoded hidden feature vector and a first context vector from a previous time slice, a weight vector indicating an amount the current encoded hidden feature vector, a previous encoded hidden feature vector, and a future encoded hidden feature vector from a future time slice contribute to a current, second context vector, an annotate NN to generate the current, second context vector based on the weight vector, the current encoded hidden feature vector, the previous encoded hidden feature vector, and the future encoded hidden feature vector, and a normal NN to generate a normalized output vector based on the context vector.
-
公开(公告)号:US20190267023A1
公开(公告)日:2019-08-29
申请号:US15908115
申请日:2018-02-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Amit Das , Jinyu Li , Rui Zhao , Yifan Gong
IPC: G10L25/30 , G10L15/16 , G10L15/183 , G10L15/06 , G06F17/30
Abstract: Generally discussed herein are devices, systems, and methods for speech recognition. Processing circuitry can implement a connectionist temporal classification (CTC) neural network (NN) including an encode NN to receive an audio frame and generate a current encoded hidden feature vector, an attend NN to generate, based on a current encoded hidden feature vector and a first context vector from a previous time slice, a weight vector indicating an amount the current encoded hidden feature vector, a previous encoded hidden feature vector, and a future encoded hidden feature vector from a future time slice contribute to a current, second context vector, an annotate NN to generate the current, second context vector based on the weight vector, the current encoded hidden feature vector, the previous encoded hidden feature vector, and the future encoded hidden feature vector, and a normal NN to generate a normalized output vector based on the context vector.
-
公开(公告)号:US20190051290A1
公开(公告)日:2019-02-14
申请号:US15675249
申请日:2017-08-11
Applicant: Microsoft Technology Licensing, LLC
Inventor: Jinyu Li , Michael Lewis Seltzer , Xi Wang , Rui Zhao , Yifan Gong
IPC: G10L15/16 , G06N3/08 , G10L15/06 , G10L15/183
Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.
-
公开(公告)号:US10019990B2
公开(公告)日:2018-07-10
申请号:US14414621
申请日:2014-09-09
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Jinyu Li , Rui Zhao , Yifan Gong
Abstract: Systems and methods for speech recognition incorporating environmental variables are provided. The systems and methods capture speech to be recognized. The speech is then recognized utilizing a variable component deep neural network (DNN). The variable component DNN processes the captured speech by incorporating an environment variable. The environment variable may be any variable that is dependent on environmental conditions or the relation of the user, the client device, and the environment. For example, the environment variable may be based on noise of the environment and represented as a signal-to-noise ratio. The variable component DNN may incorporate the environment variable in different ways. For instance, the environment variable may be incorporated into weighting matrices and biases of the DNN, the outputs of the hidden layers of the DNN, or the activation functions of the nodes of the DNN.
-
15.
公开(公告)号:US11657799B2
公开(公告)日:2023-05-23
申请号:US16840311
申请日:2020-04-03
Applicant: Microsoft Technology Licensing, LLC
Inventor: Rui Zhao , Jinyu Li , Liang Lu , Yifan Gong , Hu Hu
CPC classification number: G10L15/063 , G06N3/0445 , G06N3/08
Abstract: Techniques performed by a data processing system for training a Recurrent Neural Network Transducer (RNN-T) herein include encoder pretraining by training a neural network-based token classification model using first token-aligned training data representing a plurality of utterances, where each utterance is associated with a plurality of frames of audio data and tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames; obtaining first cross-entropy (CE) criterion from the token classification model, wherein the CE criterion represent a divergence between expected outputs and reference outputs of the model; pretraining an encoder of an RNN-T based on the first CE criterion; and training the RNN-T with second training data after pretraining the encoder of the RNN-T. These techniques also include whole-network pre-training of the RNN-T. A RNN-T pretrained using these techniques may be used to process audio data that includes spoken content to obtain a textual representation.
-
公开(公告)号:US11587569B2
公开(公告)日:2023-02-21
申请号:US15931788
申请日:2020-05-14
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Guoli Ye , Yan Huang , Wenning Wei , Lei He , Eva Sharma , Jian Wu , Yao Tian , Edward C. Lin , Yifan Gong , Rui Zhao , Jinyu Li , William Maxwell Gale
Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.
-
公开(公告)号:US11132992B2
公开(公告)日:2021-09-28
申请号:US16522416
申请日:2019-07-25
Applicant: Microsoft Technology Licensing, LLC
Inventor: Emilian Stoimenov , Rui Zhao , Kaustubh Prakash Kalgaonkar , Ivaylo Andreanov Enchev , Khuram Shahid , Anthony Phillip Stark , Guoli Ye , Mahadevan Srinivasan , Yifan Gong , Hosam Adel Khalil
Abstract: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.
-
公开(公告)号:US20190188316A1
公开(公告)日:2019-06-20
申请号:US15848929
申请日:2017-12-20
Applicant: Microsoft Technology Licensing, LLC
Inventor: Shen Huang , Yongzheng Zhang , Chi-Yi Kuan , Hu Wang , Rui Zhao , Zhou Jin
CPC classification number: G06F16/3329 , G06F17/27 , G06F17/2715 , G06F17/2785 , G06K9/6267 , G06Q50/01
Abstract: Method and system for identifying user expectations in question answering in an on-line social network system are described. The automated support system is configured to address the technical problem of optimization of the processing of user input submitted to a computer in the form of a natural language. The automated support system uses machine learning algorithms to automatically extract, from the user input, information indicative of the user's expectations and obtain data relevant to the input based on said information indicative of the user's expectations.
-
19.
公开(公告)号:US10296530B2
公开(公告)日:2019-05-21
申请号:US15252159
申请日:2016-08-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yi Zheng , Chi-Yi Kuan , Hu Wang , Rui Zhao , Yongzheng Zhang
Abstract: A topical representative assessment system implements techniques for determining entities that are ambassadors for one or more topics. The ambassadors are determined based on content items that they have authored or content items that are otherwise attributed to them. An ambassador may be any type of entity such as a person, a company, or an organization. Machine analytics may be used to determine whether a content item corresponds to a specific topic, determine a sentiment for a content item, analyze feedback for a content item, or any combination of these.
-
公开(公告)号:US11527238B2
公开(公告)日:2022-12-13
申请号:US17154956
申请日:2021-01-21
Applicant: Microsoft Technology Licensing, LLC
Inventor: Zhong Meng , Sarangarajan Parthasarathy , Xie Sun , Yashesh Gaur , Naoyuki Kanda , Liang Lu , Xie Chen , Rui Zhao , Jinyu Li , Yifan Gong
IPC: G10L15/16 , G06N3/04 , G10L15/01 , G10L15/06 , G10L15/183
Abstract: A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.
-
-
-
-
-
-
-
-
-