PITCH ESTIMATION USING LOW-FREQUENCY BAND NOISE DETECTION
    1.
    发明申请
    PITCH ESTIMATION USING LOW-FREQUENCY BAND NOISE DETECTION 审中-公开
    使用低频带噪声检测的点估计

    公开(公告)号:WO2004075571A3

    公开(公告)日:2005-01-06

    申请号:PCT/IB2004000520

    申请日:2004-02-23

    Inventor: SORIN ALEXANDER

    CPC classification number: G10L25/90 G10L21/02 G10L2025/937

    Abstract: A pitch estimation system including a low-frequency band noise detector (LBND) operative to detect the presence of low-frequency band noise in a first audio frame, a frequency-domain pitch estimator operative to calculate a pitch estimation of a second audio frame from at least one spectral peak in the second audio frame, and a pitch estimator controller operative to cause the pitch estimator to exclude from the spectrum of the second audio frame at least one low-frequency spectral peak below a predefined threshold where low-frequency band noise is present in the first audio frame.

    Abstract translation: 一种音调估计系统,包括可操作以检测第一音频帧中的低频带噪声的存在的低频带噪声检测器(LBND),用于计算第二音频帧的音调估计的频域俯仰估计器, 所述第二音频帧中的至少一个频谱峰值以及音调估计器控制器,其操作以使所述音调估计器从所述第二音频帧的频谱中排除低于预定义阈值的至少一个低频谱峰值,其中低频带噪声 存在于第一音频帧中。

    SYSTEM AND METHOD FOR COMBINED FREQUENCY-DOMAIN AND TIME-DOMAIN PITCH EXTRACTION FOR SPEECH SIGNALS
    2.
    发明公开
    SYSTEM AND METHOD FOR COMBINED FREQUENCY-DOMAIN AND TIME-DOMAIN PITCH EXTRACTION FOR SPEECH SIGNALS 有权
    系统和方法组合的俯仰角提取的频率范围内和时域语音信号

    公开(公告)号:EP1620844A4

    公开(公告)日:2008-10-08

    申请号:EP04758762

    申请日:2004-03-31

    Applicant: MOTOROLA INC IBM

    CPC classification number: G10L25/90

    Abstract: A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.

    CLASS QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION
    3.
    发明公开
    CLASS QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION 有权
    KLASSENQUANTISIERUNG用于分布式语音识别

    公开(公告)号:EP1595249A4

    公开(公告)日:2007-06-20

    申请号:EP04708622

    申请日:2004-02-05

    Applicant: MOTOROLA INC IBM

    CPC classification number: G10L25/93 G10L15/30 G10L25/90 G10L2025/935

    Abstract: A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame and calculating a codeword representing the pitch of the frame, wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame, wherein the class is any one of at least two classes indicating an indefinite pitch and at least one class indicating a definite pitch. The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class.

    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION
    4.
    发明公开
    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION 有权
    量化用于分布式语音识别

    公开(公告)号:EP1595244A4

    公开(公告)日:2006-03-08

    申请号:EP04708630

    申请日:2004-02-05

    Applicant: MOTOROLA INC IBM

    CPC classification number: G10L19/09 G10L15/30

    Abstract: A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame. If the frame is an even numbered frame and a voiced class, a codeword of a first length is calculated by absolutely quantizing the frame pitch. If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch. If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.

    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION
    5.
    发明申请
    PITCH QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION 审中-公开
    用于分布式语音识别的定量定量

    公开(公告)号:WO2004072949A3

    公开(公告)日:2004-12-09

    申请号:PCT/US2004003425

    申请日:2004-02-05

    CPC classification number: G10L19/09 G10L15/30

    Abstract: A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame (903). If the frame is an even numbered frame and a voiced class, a codeword of first length is calculated by absolutely quantizing the frame pitch (910). If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch (905). If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.

    Abstract translation: 公开了一种用于量化音频的音调信息的系统,方法和计算机可读介质。 该方法包括捕获表示多个编号帧的编号帧的音频。 该方法还包括计算帧的类别,其中类是有声或无声类中的任何一个。 如果帧是浊音类,则为帧计算音高(903)。 如果帧是偶数帧和浊音类,则通过绝对量化帧间距来计算第一长度的码字(910)。 如果帧是奇数帧,并且有声类和可靠帧可用,则通过对帧间距进行差分量化来计算第二长度的码字(905)。 如果没有可靠的帧可用,则通过绝对量化帧间距来计算第二长度的码字。

    CLASS QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION
    6.
    发明申请
    CLASS QUANTIZATION FOR DISTRIBUTED SPEECH RECOGNITION 审中-公开
    分类语音识别的类别量化

    公开(公告)号:WO2004072948A3

    公开(公告)日:2004-12-16

    申请号:PCT/US2004003419

    申请日:2004-02-05

    CPC classification number: G10L25/93 G10L15/30 G10L25/90 G10L2025/935

    Abstract: A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame (604) and calculating a codeword representing the pitch of the frame (608), wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame (610), wherein the class is any one of at least two classes indicating an indefinite pitch (614) and at least one class indicating a definite pitch (618). The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class (610).

    Abstract translation: 公开了用于量化音频的类信息和音调信息的系统,方法和计算机可读介质。 信息处理系统中的方法包括接收音频并捕获音频的帧。 该方法还包括确定帧的音调(604)并计算表示帧的音调的码字(608),其中第一码字值指示不确定音高。 所述方法还包括确定所述帧的类别(610),其中所述类别是指示不确定音调(614)的至少两个类别中的任何一个以及指示确定音高(618)的至少一个类别。 所述方法还包括计算表示所述帧的类别的码字,其中所述码字长度是表示所述至少两个类所需的最小比特数的最大值和表示所述至少一个类所需的最小比特数( 610)。

    Cuantificación de la frecuencia fundamental para el reconocimiento de voz distribuido

    公开(公告)号:ES2395717T3

    公开(公告)日:2013-02-14

    申请号:ES04708630

    申请日:2004-02-05

    Abstract: Un método para un sistema de procesamiento de información para cuantificar la información de la frecuenciafundamental de audio, que comprende: capturar audio que representa una trama numerada de una pluralidad de tramas numeradas; calcular una clase de trama, en la que una clase es una cualquiera de entre una clase sonora y una clasesorda; si la trama es una clase sonora, calcular una frecuencia fundamental para la trama; si la trama es una trama numerada par y una clase sonora, calcular una palabra clave de una primera longitudcuantificando la frecuencia fundamental de la trama de manera absoluta; si la trama es una trama numerada par y una clase sorda, calcular una palabra clave de la primera longitud queindique una trama de clase sorda; si la trama es una trama numerada impar y una clase sonora, y al menos una de las tres tramasinmediatamente anteriores a la trama es fiable, calcular una palabra clave de una segunda longitudcuantificando la frecuencia fundamental de la trama diferencial que hace referencia a una frecuenciafundamental cuantificada de la trama fiable anterior más cercana, en la que la primera longitud es mayor que lasegunda longitud; si la trama es una trama numerada impar y una clase sonora, y cada una de las tres tramas inmediatamenteanteriores a la trama no es fiable, calcular una palabra clave de la segunda longitud cuantificando la frecuenciafundamental de la trama de manera absoluta; y si la trama es una trama numerada impar y una clase sorda, calcular una palabra clave de la segunda longitudque indique una trama de clase sorda; en el que una trama numerada par es fiable si es una clase sonora, y en el que una trama numerada impar esfiable si es una clase sonora y la frecuencia fundamental de la trama se cuantifica de manera absoluta o secuantifica de manera diferencial en referencia a una frecuencia fundamental de la trama inmediatamenteanterior.

    SYSTEM AND METHOD FOR COMBINED FREQUENCY-DOMAIN AND TIME-DOMAIN PITCH EXTRACTION FOR SPEECH SIGNALS
    8.
    发明申请
    SYSTEM AND METHOD FOR COMBINED FREQUENCY-DOMAIN AND TIME-DOMAIN PITCH EXTRACTION FOR SPEECH SIGNALS 审中-公开
    用于语音信号的组合频域和时域音调提取的系统和方法

    公开(公告)号:WO2004090865A3

    公开(公告)日:2005-12-01

    申请号:PCT/US2004010119

    申请日:2004-03-31

    CPC classification number: G10L25/90

    Abstract: A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.

    Abstract translation: 一种用于采样语音信号的系统,计算机可读介质和方法; 将采样的语音信号分成重叠的帧; 使用频域分析从帧中提取第一音调信息; 从所述第一音高信息提供至少一个音高候选者,每个音高候选者与频谱分数相关联,所述至少一个音高候选者中的每一个表示所述帧的可能音高估计值; 使用时域分析从帧中提取第二音调信息; 从第二音高信息提供至少一个音高候选者的相关分数; 以及选择所述至少一个音调候选中的一个以表示所述帧的音调估计。 该系统,计算机可读介质和方法适用于语音编码和分布式语音识别。

    Statistische Verbesserung von Sprachausgabe aus einem Text-To-Speech-Synthesesystem

    公开(公告)号:DE112012002524T5

    公开(公告)日:2014-03-13

    申请号:DE112012002524

    申请日:2012-06-28

    Applicant: IBM

    Abstract: Ein Verfahren wird zur Verbesserung von Sprache beschrieben, die durch ein statistisches Text-to-Speech-(TTS-)System synthetisiert wird, das eine parametrische Darstellung von Sprache in einem Raum von akustischen Funktionsvektoren verwendet. Das Verfahren beinhaltet: Definieren einer parametrischen Familie von Korrektur-Transformationen, die in dem Raum der akustischen Funktionsvektoren betrieben wird und von einem Satz Verbesserungsparameter abhängt; und Definieren einer Verzerrungsangabe eines Funktionsvektors oder einer Vielzahl von Funktionsvektoren. Das Verfahren beinhaltet ferner: Empfangen eines Funktionsvektors, der durch das System ausgegeben wird; und Erzeugen einer Instanz der Korrektur-Transformation durch: Berechnen eines Referenzwerts der Verzerrungsangabe, der einem statistischen Modell der phonetischen Einheit zuzuschreiben ist, die den Funktionsvektor aussendet; Berechnen eines Ist-Werts der Verzerrungsangabe, der Funktionsvektoren zuzuschreiben ist, die durch das statistische Modell der phonetischen Einheit ausgesendet werden, die den Funktionsvektor aussendet; Berechnen der Verbesserungsparameterwerte, die von dem Referenzwert der Verzerrungsangabe, dem Ist-Wert der Verzerrungsangabe und der parametrischen Korrektur-Transformation abhängen; und Ableiten einer Instanz der Korrektur-Transformation, die den Verbesserungsparameterwerten aus der parametrischen Familie der Korrektur-Transformationen entspricht. Die Instanz der Korrektur-Transformation kann auf den Funktionsvektor angewendet werden, um einen verbesserten Funktionsvektor bereitzustellen.

    10.
    发明专利
    未知

    公开(公告)号:BRPI0406952A

    公开(公告)日:2006-01-03

    申请号:BRPI0406952

    申请日:2004-02-05

    Applicant: MOTOROLA INC IBM

    Abstract: A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame and calculating a codeword representing the pitch of the frame, wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame, wherein the class is any one of at least two classes indicating an indefinite pitch and at least one class indicating a definite pitch. The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class.

Patent Agency Ranking