Abstract:
A pitch estimation system including a low-frequency band noise detector (LBND) operative to detect the presence of low-frequency band noise in a first audio frame, a frequency-domain pitch estimator operative to calculate a pitch estimation of a second audio frame from at least one spectral peak in the second audio frame, and a pitch estimator controller operative to cause the pitch estimator to exclude from the spectrum of the second audio frame at least one low-frequency spectral peak below a predefined threshold where low-frequency band noise is present in the first audio frame.
Abstract:
A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.
Abstract:
A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame and calculating a codeword representing the pitch of the frame, wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame, wherein the class is any one of at least two classes indicating an indefinite pitch and at least one class indicating a definite pitch. The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class.
Abstract:
A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame. If the frame is an even numbered frame and a voiced class, a codeword of a first length is calculated by absolutely quantizing the frame pitch. If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch. If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.
Abstract:
A system, method and computer readable medium for quantizing pitch information of audio is disclosed. The method includes capturing audio representing a numbered frame of a plurality of numbered frames. The method further includes calculating a class of the frame, wherein a class is any one of a voiced or unvoiced class. If the frame is a voiced class, a pitch is calculated for the frame (903). If the frame is an even numbered frame and a voiced class, a codeword of first length is calculated by absolutely quantizing the frame pitch (910). If the frame is an odd numbered frame and a voiced class and a reliable frame is available, a codeword of a second length is calculated by differentially quantizing the frame pitch (905). If there is no reliable frame available, a codeword of the second length is calculated by absolutely quantizing the frame pitch.
Abstract:
A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame (604) and calculating a codeword representing the pitch of the frame (608), wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame (610), wherein the class is any one of at least two classes indicating an indefinite pitch (614) and at least one class indicating a definite pitch (618). The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class (610).
Abstract:
Un método para un sistema de procesamiento de información para cuantificar la información de la frecuenciafundamental de audio, que comprende: capturar audio que representa una trama numerada de una pluralidad de tramas numeradas; calcular una clase de trama, en la que una clase es una cualquiera de entre una clase sonora y una clasesorda; si la trama es una clase sonora, calcular una frecuencia fundamental para la trama; si la trama es una trama numerada par y una clase sonora, calcular una palabra clave de una primera longitudcuantificando la frecuencia fundamental de la trama de manera absoluta; si la trama es una trama numerada par y una clase sorda, calcular una palabra clave de la primera longitud queindique una trama de clase sorda; si la trama es una trama numerada impar y una clase sonora, y al menos una de las tres tramasinmediatamente anteriores a la trama es fiable, calcular una palabra clave de una segunda longitudcuantificando la frecuencia fundamental de la trama diferencial que hace referencia a una frecuenciafundamental cuantificada de la trama fiable anterior más cercana, en la que la primera longitud es mayor que lasegunda longitud; si la trama es una trama numerada impar y una clase sonora, y cada una de las tres tramas inmediatamenteanteriores a la trama no es fiable, calcular una palabra clave de la segunda longitud cuantificando la frecuenciafundamental de la trama de manera absoluta; y si la trama es una trama numerada impar y una clase sorda, calcular una palabra clave de la segunda longitudque indique una trama de clase sorda; en el que una trama numerada par es fiable si es una clase sonora, y en el que una trama numerada impar esfiable si es una clase sonora y la frecuencia fundamental de la trama se cuantifica de manera absoluta o secuantifica de manera diferencial en referencia a una frecuencia fundamental de la trama inmediatamenteanterior.
Abstract:
A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.
Abstract:
Ein Verfahren wird zur Verbesserung von Sprache beschrieben, die durch ein statistisches Text-to-Speech-(TTS-)System synthetisiert wird, das eine parametrische Darstellung von Sprache in einem Raum von akustischen Funktionsvektoren verwendet. Das Verfahren beinhaltet: Definieren einer parametrischen Familie von Korrektur-Transformationen, die in dem Raum der akustischen Funktionsvektoren betrieben wird und von einem Satz Verbesserungsparameter abhängt; und Definieren einer Verzerrungsangabe eines Funktionsvektors oder einer Vielzahl von Funktionsvektoren. Das Verfahren beinhaltet ferner: Empfangen eines Funktionsvektors, der durch das System ausgegeben wird; und Erzeugen einer Instanz der Korrektur-Transformation durch: Berechnen eines Referenzwerts der Verzerrungsangabe, der einem statistischen Modell der phonetischen Einheit zuzuschreiben ist, die den Funktionsvektor aussendet; Berechnen eines Ist-Werts der Verzerrungsangabe, der Funktionsvektoren zuzuschreiben ist, die durch das statistische Modell der phonetischen Einheit ausgesendet werden, die den Funktionsvektor aussendet; Berechnen der Verbesserungsparameterwerte, die von dem Referenzwert der Verzerrungsangabe, dem Ist-Wert der Verzerrungsangabe und der parametrischen Korrektur-Transformation abhängen; und Ableiten einer Instanz der Korrektur-Transformation, die den Verbesserungsparameterwerten aus der parametrischen Familie der Korrektur-Transformationen entspricht. Die Instanz der Korrektur-Transformation kann auf den Funktionsvektor angewendet werden, um einen verbesserten Funktionsvektor bereitzustellen.
Abstract:
A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame and calculating a codeword representing the pitch of the frame, wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame, wherein the class is any one of at least two classes indicating an indefinite pitch and at least one class indicating a definite pitch. The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class.