Abstract:
Systems and methods are provided for generating a DNN classifier by "learning" a "student" DNN model from a larger, more accurate "teacher" DNN model. The student DNN may be trained from unlabeled training data by passing the unlabeled training data through the teacher DNN, which may be trained from labeled data. In one embodiment, an iterative processis applied to train the student DNN by minimizing the divergence of the output distributions from the teacher and student DNN models. For each iteration until convergence, the difference in the outputs of these two DNNsis used to update the student DNN model, and outputs are determined again, using the unlabeled training data. The resulting trained student DNN model may be suitable for providing accurate signal processing applications on devices having limited computational or storage resources such as mobile or wearable devices. In an embodiment, the teacher DNN model comprises an ensemble of DNN models.
Abstract:
Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.
Abstract:
A high-dimensional posterior-based feature with partial distance elimination may be utilized for speech recognition. The log likelihood values of a large number of Gaussians are needed to generate the high-dimensional posterior feature. Gaussians with very small log likelihoods are associated with zero posterior values. Log likelihoods for Gaussians for a speech frame may be evaluated with a partial distance elimination method. If the partial distance of a Gaussian is already too small, the Gaussian will have a zero posterior value. The partial distance may be calculated by sequentially adding individual dimensions in a group of dimensions. The partial distance elimination occurs when less than all of the dimensions in the group are sequentially added.
Abstract:
Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese.
Abstract:
Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese.