Abstract:
Un método (100) incluye recibir (101) una señal de audio digital de entrada que comprende una señal de banda angosta; la señal de audio digital de entrada es procesada (102) para generar una señal de audio digital procesada; se determina (103) un estimado del nivel de energía de banda alta correspondiente a una señal de audio digital de entrada de ancho de banda extendido; se realiza la modificación del nivel de energía de banda alta estimado con base en una precisión de la estimación y/o características de la señal de banda angosta (104); una señal de audio digital de banda alta es generada con base en el estimado modificado del nivel de energía de banda alta y un espectro de banda alta estimado correspondiente al estimado modificado del nivel de energía de banda alta (105).
Abstract:
A system or method for modeling a signal, such as a speech signal, in which harmonic frequencies and amplitudes are identified and the harmonic magnitudes are interpolated to obtain spectral magnitudes at a set of fixed frequencies. An inverse transform is applied to the spectral magnitudes to obtain a pseudo auto-correlation sequence, from which linear prediction coefficients are calculated. From the linear prediction coefficients, model harmonic magnitudes are generated by sampling the spectral envelope defined by the linear prediction coefficients. A set of scale factors are then calculated as the ratio of the harmonic magnitudes to the model harmonic magnitudes and interpolated to obtain a second set of scale factors at the set of fixed frequencies. The spectral envelope magnitudes at the set of fixed frequencies are multiplied by the second set of scale factors to obtain new spectral magnitudes and the process is iterated to obtain final linear prediction coefficients. The signal is modeled by the linear prediction coefficients.
Abstract:
A method and apparatus for noise suppression within a distributed speech recognition system is provided herein. Mel-frequency cepstral coefficients (MFCCs) values are converted to filter bank outputs (F'0 through F'22). The filter bank outputs are then used by a noise suppressor (303) for channel energy estimation, noise energy estimation, etc. Noise-suppression takes place on F'0 through F'22 and the noise-suppressed filter bank outputs F''0 through F''22 are converted back to MFCC values.
Abstract:
A signal that includes noise (301) is sampled to provide a plurality of digital information samples (303). A predetermined number of the digital information samples are grouped as a set (305). Noise suppression is performed on the signal using the following steps. One or more digital representations of silence is attached to the set, forming an extended set (401). A Fourier transform is performed on the extended set, yielding a set of frequency domain coefficients (403), at least some of which are scaled (405). An inverse Fourier transform is performed on the set of scaled frequency domain coefficients to provide a set of time domain samples (407), which are partially overlapped in time and added with a previously formed set of time domain samples (409 and 411), which result is provided with the non-overlapping time domain samples as a noise suppressed version of the signal (413).
Abstract:
A system or method for modeling a signal, such as a speech signal, wherein harmonic frequencies and amplitudes are identified (106) and the harmonic magnitudes are interpolated (110) to obtain spectral magnitudes at a set of fixed frequencies. An inverse transform is applied (112) to the spectral magnitudes to obtain a pseudo auto-correlation sequence, from which linear prediction coefficients are calculated (114). From the linear prediction coefficients, model harmonic magnitudes are generated by sampling the spectral envelope (118) defined by the linear prediction coefficients. A set of scale factors are then calculated (120) as the ratio of the harmonic magnitudes to the model harmonic magnitudes and interpolated to obtain a second set of scale factors (122) at the set of fixed frequencies. The spectral envelope magnitudes at the set of fixed frequencies (124) are multiplied by the second set of scale factors (126) to obtain new spectral magnitudes and the process is iterated to obtain final linear prediction coefficients.
Abstract:
An echo canceling circuit comprising a double talk detector, an upper band signal filter configured to pass only near-end upper band signals to the double talk detector and remove lower band signals, an adaptive filter circuit, a control circuit operatively coupled to the double talk detector and to the adaptive filter circuit, and a threshold estimator configured to iteratively calculate an upper adaptive decision threshold value and a lower adaptive decision threshold value. The double talk detector declares near-end speech to be present if an estimated power level of the upper band signals exceeds the upper adaptive decision threshold value, and declares the near-end speech to be absent if the estimated power level of the upper band signals falls below the lower adaptive decision threshold value for a predetermined number of iterative cycles.
Abstract:
A method and apparatus are provided for reproducing a speech sequence of a user through a communication device of the user. The method includes the steps of detecting a speech sequence from the user through the communication device, recognizing a phoneme sequence within the detected speech sequence and forming a confidence level of each phoneme within the recognized phoneme sequence. The method further includes the steps of audibly reproducing the recognized phoneme sequence for the user through the communication device and gradually highlighting or degrading a voice quality of at least some phonemes of the recognized phoneme sequence based upon the formed confidence level of the at least some phonemes.
Abstract:
A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.
Abstract:
A speech communication system provides a speech encoder [100] that generates a set of coded parameters representative of the desired speech signal characteristics. The speech communication system also provides a speech decoder [200] that receives the set of coded parameters to generate reconstructed speech. The speech decoder includes an equalizer [204] that computes a matching set of parameters from the reconstructed speech [301] generated by the speech decoder [200], undoes the set of characteristics corresponding to the computed set of parameters, and imposes the set of characteristics corresponding to the coded set of parameters, thereby producing equalized reconstructed speech [306].
Abstract:
An echo canceling circuit comprising a double talk detector, an upper band signal filter configured to pass only near-end upper band signals to the double talk detector and remove lower band signals, an adaptive filter circuit, a control circuit operatively coupled to the double talk detector and to the adaptive filter circuit, and a threshold estimator configured to iteratively calculate an upper adaptive decision threshold value and a lower adaptive decision threshold value. The double talk detector declares near-end speech to be present if an estimated power level of the upper band signals exceeds the upper adaptive decision threshold value, and declares the near-end speech to be absent if the estimated power level of the upper band signals falls below the lower adaptive decision threshold value for a predetermined number of iterative cycles.