METHOD AND APPARATUS FOR SPEECH RECONSTRUCTION IN A DISTRIBUTED SPEECH RECOGNITION SYSTEM
    21.
    发明公开
    METHOD AND APPARATUS FOR SPEECH RECONSTRUCTION IN A DISTRIBUTED SPEECH RECOGNITION SYSTEM 有权
    方法和设备语音重建分布式语音识别系统

    公开(公告)号:EP1395978A4

    公开(公告)日:2005-09-21

    申请号:EP02709089

    申请日:2002-01-18

    Applicant: MOTOROLA INC

    CPC classification number: G10L15/30 G10L19/00 G10L19/093 G10L25/18

    Abstract: A method of reconstructing speech input at a communication device comprises receiving, at the communication device, encoded data that includes encoded spectral data and encoded energy data of the speech input, the encoded spectral data being encoded as a series of mel-frequency cepstral coefficients. The method further comprises decoding, at the communication device, the encoded spectral data and encoded energy data to determine the spectral data and energy data, wherein decoding comprises: performing an inverse discrete cosine transform on the mel-frequency cepstral coefficients at harmonic mel-frequencies corresponding to a pitch period of the speech input to determine log-spectral magnitudes of the speech input at the harmonic mel-frequencies, and exponentiating the log-spectral magnitudes to determine the spectral magnitudes of the speech input. The method also comprises combining the spectral data and energy data to reconstruct the speech input at the communication device. A communication device for use in distributed speech recognition system is also disclosed.

    Método y aparato para estimar la energía de banda alta en un sistema de extensión del ancho de banda para señales de audio

    公开(公告)号:ES2467966T3

    公开(公告)日:2014-06-13

    申请号:ES09707285

    申请日:2009-02-05

    Abstract: Un método de extensión del ancho de banda que comprende: recibir una señal de audio digital de entrada que comprende una señal de banda estrecha en un primer intervalo de frecuencias; determinar un nivel de energía de banda alta estimado en un segundo intervalo de frecuencias, correspondientes a la señal de audio digital de entrada, donde el segundo intervalo de frecuencias es mayor en frecuencia que el primer intervalo de frecuencias y a la energía de banda alta estimada le falta información para ser estimada y utilizada en la extensión del ancho de banda; y modificar el nivel de energía de banda alta estimado sobre la base de las características de la señal de banda estrecha; donde la etapa de modificar el nivel de energía de banda alta estimado comprende la etapa de modificar el nivel de energía de banda alta estimado sobre la base de una ocurrencia de un ataque / sonido oclusivo; donde los niveles de energía de banda alta estimados de una secuencia de Kmax tramas que empieza en una trama en la cual se ha detectado el ataque / sonido oclusivo son modificados; donde las primeras Kmin tramas son ajustadas a un nivel de energía lo más bajo posible Emin; donde la modificación de los niveles de energía de banda alta estimados continúa hasta la trama Kmax-ésima siempre que el nivel de voz de una trama dentro de la secuencia de Kmax tramas excede un umbral; y donde la modificación del nivel de energía de banda alta estimado viene dada por la disminución del nivel de energía de banda alta en una cantidad fija hasta una trama KT en la que el nivel de voz de la trama excede un umbral y es aumentado de nuevo hacia la energía de banda alta estimada.

    METODO Y APARATO PARA ESTIMAR ENERGIA DE BANDA ALTA EN UN SISTEMA DE EXTENSION DE ANCHO DE BANDA.

    公开(公告)号:MX2010008288A

    公开(公告)日:2010-08-31

    申请号:MX2010008288

    申请日:2009-02-05

    Applicant: MOTOROLA INC

    Abstract: Un método (100) incluye recibir (101) una señal de audio digital de entrada que comprende una señal de banda angosta; la señal de audio digital de entrada es procesada (102) para generar una señal de audio digital procesada; se determina (103) un estimado del nivel de energía de banda alta correspondiente a una señal de audio digital de entrada de ancho de banda extendido; se realiza la modificación del nivel de energía de banda alta estimado con base en una precisión de la estimación y/o características de la señal de banda angosta (104); una señal de audio digital de banda alta es generada con base en el estimado modificado del nivel de energía de banda alta y un espectro de banda alta estimado correspondiente al estimado modificado del nivel de energía de banda alta (105).

    25.
    发明专利
    未知

    公开(公告)号:DE60305907D1

    公开(公告)日:2006-07-20

    申请号:DE60305907

    申请日:2003-02-14

    Applicant: MOTOROLA INC

    Abstract: A system or method for modeling a signal, such as a speech signal, in which harmonic frequencies and amplitudes are identified and the harmonic magnitudes are interpolated to obtain spectral magnitudes at a set of fixed frequencies. An inverse transform is applied to the spectral magnitudes to obtain a pseudo auto-correlation sequence, from which linear prediction coefficients are calculated. From the linear prediction coefficients, model harmonic magnitudes are generated by sampling the spectral envelope defined by the linear prediction coefficients. A set of scale factors are then calculated as the ratio of the harmonic magnitudes to the model harmonic magnitudes and interpolated to obtain a second set of scale factors at the set of fixed frequencies. The spectral envelope magnitudes at the set of fixed frequencies are multiplied by the second set of scale factors to obtain new spectral magnitudes and the process is iterated to obtain final linear prediction coefficients. The signal is modeled by the linear prediction coefficients.

    26.
    发明专利
    未知

    公开(公告)号:BRPI0406956A

    公开(公告)日:2006-01-03

    申请号:BRPI0406956

    申请日:2004-02-05

    Applicant: MOTOROLA INC IBM

    Abstract: A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame. If the frame is an even numbered frame and a voiced class, a codeword of a first length is calculated by absolutely quantizing the frame pitch. If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch. If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.

    27.
    发明专利
    未知

    公开(公告)号:BRPI0406937A

    公开(公告)日:2006-01-03

    申请号:BRPI0406937

    申请日:2004-01-20

    Applicant: MOTOROLA INC

    Abstract: A method and apparatus for noise suppression within a distributed speech recognition system is provided herein. Mel-frequency cepstral coefficients (MFCCs) values are converted to filter bank outputs (F'0 through F'22). The filter bank outputs are then used by a noise suppressor (303) for channel energy estimation, noise energy estimation, etc. Noise-suppression takes place on F'0 through F'22 and the noise-suppressed filter bank outputs F''0 through F''22 are converted back to MFCC values.

    28.
    发明专利
    未知

    公开(公告)号:FR2739481B1

    公开(公告)日:1999-02-26

    申请号:FR9611654

    申请日:1996-09-25

    Applicant: MOTOROLA INC

    Abstract: A signal that includes noise (301) is sampled to provide a plurality of digital information samples (303). A predetermined number of the digital information samples are grouped as a set (305). Noise suppression is performed on the signal using the following steps. One or more digital representations of silence is attached to the set, forming an extended set (401). A Fourier transform is performed on the extended set, yielding a set of frequency domain coefficients (403), at least some of which are scaled (405). An inverse Fourier transform is performed on the set of scaled frequency domain coefficients to provide a set of time domain samples (407), which are partially overlapped in time and added with a previously formed set of time domain samples (409 and 411), which result is provided with the non-overlapping time domain samples as a noise suppressed version of the signal (413).

    METHOD FOR MODELING SPEECH HARMONIC MAGNITUDES
    29.
    发明公开
    METHOD FOR MODELING SPEECH HARMONIC MAGNITUDES 有权
    方法模拟语音谐波量的

    公开(公告)号:EP1495465A4

    公开(公告)日:2005-05-18

    申请号:EP03745516

    申请日:2003-02-14

    Applicant: MOTOROLA INC

    CPC classification number: G10L19/06 G10L19/087

    Abstract: A system or method for modeling a signal, such as a speech signal, wherein harmonic frequencies and amplitudes are identified (106) and the harmonic magnitudes are interpolated (110) to obtain spectral magnitudes at a set of fixed frequencies. An inverse transform is applied (112) to the spectral magnitudes to obtain a pseudo auto-correlation sequence, from which linear prediction coefficients are calculated (114). From the linear prediction coefficients, model harmonic magnitudes are generated by sampling the spectral envelope (118) defined by the linear prediction coefficients. A set of scale factors are then calculated (120) as the ratio of the harmonic magnitudes to the model harmonic magnitudes and interpolated to obtain a second set of scale factors (122) at the set of fixed frequencies. The spectral envelope magnitudes at the set of fixed frequencies (124) are multiplied by the second set of scale factors (126) to obtain new spectral magnitudes and the process is iterated to obtain final linear prediction coefficients.

    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION
    30.
    发明申请
    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION 审中-公开
    用于分布式语音识别的定量定量

    公开(公告)号:WO2004072949A3

    公开(公告)日:2004-12-09

    申请号:PCT/US2004003425

    申请日:2004-02-05

    CPC classification number: G10L19/09 G10L15/30

    Abstract: A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame (903). If the frame is an even numbered frame and a voiced class, a codeword of first length is calculated by absolutely quantizing the frame pitch (910). If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch (905). If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.

    Abstract translation: 公开了一种用于量化音频的音调信息的系统,方法和计算机可读介质。 该方法包括捕获表示多个编号帧的编号帧的音频。 该方法还包括计算帧的类别,其中类是有声或无声类中的任何一个。 如果帧是浊音类,则为帧计算音高(903)。 如果帧是偶数帧和浊音类,则通过绝对量化帧间距来计算第一长度的码字(910)。 如果帧是奇数帧,并且有声类和可靠帧可用,则通过对帧间距进行差分量化来计算第二长度的码字(905)。 如果没有可靠的帧可用,则通过绝对量化帧间距来计算第二长度的码字。

Patent Agency Ranking