Abstract:
An echo canceling circuit comprising a double talk detector, an upper band signal filter configured to pass only near-end upper band signals to the double talk detector and remove lower band signals, an adaptive filter circuit, a control circuit operatively coupled to the double talk detector and to the adaptive filter circuit, and a threshold estimator configured to iteratively calculate an upper adaptive decision threshold value and a lower adaptive decision threshold value. The double talk detector declares near-end speech to be present if an estimated power level of the upper band signals exceeds the upper adaptive decision threshold value, and declares the near-end speech to be absent if the estimated power level of the upper band signals falls below the lower adaptive decision threshold value for a predetermined number of iterative cycles.
Abstract:
A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame (604) and calculating a codeword representing the pitch of the frame (608), wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame (610), wherein the class is any one of at least two classes indicating an indefinite pitch (614) and at least one class indicating a definite pitch (618). The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class (610).
Abstract:
A method and apparatus are provided for reproducing a speech sequence of a user through a communication device of the user. The method includes the steps of detecting a speech sequence from the user through the communication device, recognizing a phoneme sequence within the detected speech sequence and forming a confidence level of each phoneme within the recognized phoneme sequence. The method further includes the steps of audibly reproducing the recognized phoneme sequence for the user through the communication device and gradually highlighting or degrading a voice quality of at least some phonemes of the recognized phoneme sequence based upon the formed confidence level of the at least some phonemes.
Abstract:
A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.
Abstract:
A speech communication system provides a speech encoder [100] that generates a set of coded parameters representative of the desired speech signal characteristics. The speech communication system also provides a speech decoder [200] that receives the set of coded parameters to generate reconstructed speech. The speech decoder includes an equalizer [204] that computes a matching set of parameters from the reconstructed speech [301] generated by the speech decoder [200], undoes the set of characteristics corresponding to the computed set of parameters, and imposes the set of characteristics corresponding to the coded set of parameters, thereby producing equalized reconstructed speech [306].
Abstract:
An echo canceling circuit comprising a double talk detector, an upper band signal filter configured to pass only near-end upper band signals to the double talk detector and remove lower band signals, an adaptive filter circuit, a control circuit operatively coupled to the double talk detector and to the adaptive filter circuit, and a threshold estimator configured to iteratively calculate an upper adaptive decision threshold value and a lower adaptive decision threshold value. The double talk detector declares near-end speech to be present if an estimated power level of the upper band signals exceeds the upper adaptive decision threshold value, and declares the near-end speech to be absent if the estimated power level of the upper band signals falls below the lower adaptive decision threshold value for a predetermined number of iterative cycles.
Abstract:
A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.
Abstract:
A method (Fig. 9) and apparatus (500, 600) for prediction in a speech-coding system extends a 1st order long-term predictor (LTP) filter, using a sub-sample resolution delay, to a multi-tap LTP filter (504, 604). From another perspective, a conventional integer-sample resolution multi-tap LTP filter is extended to use sub-sample resolution delay. Such a multi-tap LTP filter offers a number of advantages over the prior-art. Particularly, defining the lag with sub-sample resolution makes it possible to explicitly model the delay values that have a fractional component, within the limits of resolution of the over-sampling factor used by the interpolation filter. The coefficients (ßi's) of the multi-tap LTP filter are thus largely freed from modeling the effect of delays that have a fractional component. Consequently their main function is to maximize the prediction gain of the LTP filter via modeling the degree of periodicity that is present and by imposing spectral shaping.
Abstract:
A method and apparatus for suppressing acoustic background noise in a communication system. An operating signal-to-noise ratio (SNR) level is reliably evaluated from channel energy (293) and background noise energy (294) values by a SNR level estimator (295). A minimum gain factor and a gain slope are adapted (290) depending on the operating SNR level. Using these adapted values and the channel SNR, the channel gain is selected (233). When the channel SNR is below a certain threshold, the channel is completely noise-like and the gain factor selected is minimum so that the channel is maximally attenuated. When the channel SNR is fairly high, the channel gain selected is 0 dB. For intermediate values of channel SNR, the gain factor selected lies between minimum and 0 dB.
Abstract:
A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.