Abstract:
Methods and systems for i-vector based clustering training data in speech recognition are described. An i-vector may be extracted from a speech segment of a speech training data to represent acoustic information. The extracted i-vectors from the speech training data may be clustered into multiple clusters using a hierarchical divisive clustering algorithm. Using a cluster of the multiple clusters, an acoustic model may be trained. This trained acoustic model may be used in speech recognition.
Abstract:
A high-dimensional posterior-based feature with partial distance elimination may be utilized for speech recognition. The log likelihood values of a large number of Gaussians are needed to generate the high-dimensional posterior feature. Gaussians with very small log likelihoods are associated with zero posterior values. Log likelihoods for Gaussians for a speech frame may be evaluated with a partial distance elimination method. If the partial distance of a Gaussian is already too small, the Gaussian will have a zero posterior value. The partial distance may be calculated by sequentially adding individual dimensions in a group of dimensions. The partial distance elimination occurs when less than all of the dimensions in the group are sequentially added.