Abstract:
A device (100) including a voice recognition system (204, 206, 207, 208) that generates a signal representative of the speech utterance. The utterance is divided into frames (Ft) representative of the utterance. Frames are allocated to states (S1-S5) using an alignment algorithm. A path representing frame to state allocations is stored in memory (110) using state transition types identifying a state transition to each state. Lattice traceback information for the voice recognition system is stored and updated by generating a traceback array having a plurality of rows and one or more columns, with each row of the plurality of rows corresponding to one of a plurality of states in which a traceback path terminates, and each column containing one or more dwell counts for states in the traceback path. An optimal state transition path into a given state of the plurality of states is determined, and the generated traceback array is updated in response to the determined optimal state transition path.
Abstract:
A device (100) including a voice recognition system (204, 206, 207, 208) that generates a signal representative of the speech utterance. The utterance is divided into frames (Ft) representative of the utterance. Frames are allocated to states (S1-S5) using an alignment algorithm. A path representing frame to state allocations is stored in memory (110) using state transition types identifying a state transition to each state. Lattice traceback information for the voice recognition system is stored and updated by generating a traceback array having a plurality of rows and one or more columns, with each row of the plurality of rows corresponding to one of a plurality of states in which a traceback path terminates, and each column containing one or more dwell counts for states in the traceback path. An optimal state transition path into a given state of the plurality of states is determined, and the generated traceback array is updated in response to the determined optimal state transition path.
Abstract:
A method of reconstructing speech input at a communication device comprises receiving, at the communication device, encoded data that includes encoded spectral data and encoded energy data of the speech input, the encoded spectral data being encoded as a series of mel-frequency cepstral coefficients. The method further comprises decoding, at the communication device, the encoded spectral data and encoded energy data to determine the spectral data and energy data, wherein decoding comprises: performing an inverse discrete cosine transform on the mel-frequency cepstral coefficients at harmonic mel-frequencies corresponding to a pitch period of the speech input to determine log-spectral magnitudes of the speech input at the harmonic mel-frequencies, and exponentiating the log-spectral magnitudes to determine the spectral magnitudes of the speech input. The method also comprises combining the spectral data and energy data to reconstruct the speech input at the communication device. A communication device for use in distributed speech recognition system is also disclosed.