Abstract:
A digital speech coder includes a long-term filter (124) having an improved sub-sample resolution long-term predictor which allows for subsample resolution for the lag parameter L. A frame of N samples of input speech vector s(n) is applied to an adder (510). The output of the adder (510) produces the output vector b(n) for the long term filter (124). The output vector b(n) is fed back to a delayed vector generator block (530) of the long-term predictor. The nominal long-term predictor lag parameter L is also input to the delayed vector generator block (530). The long-term predictor lag parameter L can take on non-integer values, which may be multiples of one half, one third, one fourth or any other rational fraction. The delayed vector generator (530) includes a memory which holds past samples of b(n). In addition, interpolated samples of b(n) are also calculated by the delayed vector generator (530) and stored in its memory, at least one interpolated sample being calculated and stored between each past sample of b(n). The delayed vector generator (530) provides output vector q(n) to the long-term multiplier block (520), which scales the long-term predictor response by the long-term predictor coefficient beta . The scaled output beta q(n) is then applied to the adder (510) to complete the feedback loop of the recursive filter (124).
Abstract:
A speech coder (300) that performs analysis-by-synthesis coding of a signal determines gain parameters for each constituent component of multiple constituent components of a synthetic excitation signal (ex(n)). The speech coder generates a target vector (p(n)) based on an input signal (s(n)). The speech coder further generates multiple constituent components associated with the synthetic excitation signal, wherein one constituent component of the multiple constituent components is based on a shifted version of another constituent component of the multiple constituent components. The speech coder further evaluates an error criteria based on the target vector and the multiple constituent components to determine a gain associated with each constituent component of the multiple constituent components.
Abstract:
A method of reconstructing speech input at a communication device comprises receiving, at the communication device, encoded data that includes encoded spectral data and encoded energy data of the speech input, the encoded spectral data being encoded as a series of mel-frequency cepstral coefficients. The method further comprises decoding, at the communication device, the encoded spectral data and encoded energy data to determine the spectral data and energy data, wherein decoding comprises: performing an inverse discrete cosine transform on the mel-frequency cepstral coefficients at harmonic mel-frequencies corresponding to a pitch period of the speech input to determine log-spectral magnitudes of the speech input at the harmonic mel-frequencies, and exponentiating the log-spectral magnitudes to determine the spectral magnitudes of the speech input. The method also comprises combining the spectral data and energy data to reconstruct the speech input at the communication device. A communication device for use in distributed speech recognition system is also disclosed.
Abstract:
A method for detecting and attenuating inhalation noise in a communication system coupled to a pressurized air delivery system, the method including the steps of: generating an inhalation noise model (912, 1012) based on inhalation noise; receiving an input signal (802) that includes inhalation noise; comparing (810) the input signal to the noise model to obtain a similarity measure; determining (854) a gain factor based on the similarity measure; and modifying (852) the input signal based on the gain factor, wherein the inhalation noise in the input signal is attenuated based on the gain factor.
Abstract:
A method for characterizing inhalation noise within a pressurized air delivery system, the method including the steps of: generating an inhalation noise model (912, 1012) based on inhalation noise; receiving an input signal (802) that includes inhalation noise comprising at least one inhalation noise burst; comparing (810) the input signal to the noise model to obtain a similarity measure; comparing the similarity measure to at least one threshold (832, 834) to detect the at least one inhalation noise burst; and characterizing (1354, 1356) the at least one detected inhalation noise burst.
Abstract:
A method for equalizing a speech signal generated within a pressurized air delivery system, the method including the steps of: generating an inhalation noise model (1152) based on inhalation noise; receiving an input signal (802) that includes a speech signal; and equalizing the speech signal (1156) based on the noise model.
Abstract:
Un método (100) que incluye recibir (101) una señal de audio digital de entrada que comprende una señal de banda estrecha. La señal de audio digital de entrada se procesa (102) para generar una señal procesada de audio digital. Un nivel de energía de banda alta correspondiente a la señal de audio digital de entrada se calcula (103) con base en una energía calculada de una banda de transición de la señal procesada de audio digital en un rango predeterminado de frecuencias superiores de un ancho de banda de banda estrecha. Una señal de audio digital de banda alta se genera (104) con base en el nivel de energía de banda alta y un espectro calculado de banda alta correspondiente al nivel de energía de banda alta.
Abstract:
A method for equalizing a speech signal generated within a pressurized air delivery system, the method including the steps of: generating an inhalation noise model ( 1152 ) based on inhalation noise; receiving an input signal ( 802 ) that includes a speech signal; and equalizing the speech signal ( 1156 ) based on the noise model.
Abstract:
A method for protecting information bits wherein input data bits, at least some of which are to be protected, are sorted based upon information determined from a subset of the input data bits. An error control coding technique is applied to at least some of the sorted bits. In the preferred embodiment, an input data stream of voice coder bits is separated into arrays of bits. A first array (302) comprises voice coder bits needing error protection, with the bits arranged in order of importance determined by voicing mode. The second array (303) comprises bits that will not be error protected. The bits from the first array are provided to the input of an encoder (304), then the encoded bits are combined (305) with the bits from the second array (303) to form a bit stream.