SPEECH FEATURE EXTRACTING APPARATUS, SPEECH FEATURE EXTRACTING METHOD, AND SPEECH FEATURE EXTRACTING PROGRAM

    公开(公告)号:GB2485926B

    公开(公告)日:2013-06-05

    申请号:GB201202741

    申请日:2010-07-12

    Applicant: IBM

    Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.

    Vorrichtung zur Extraktion von Sprachmerkmalen, Verfahren zur Extraktion von Sprachmerkmalen und Programm zur Extraktion von Sprachmerkmalen

    公开(公告)号:DE112010003461B4

    公开(公告)日:2019-09-05

    申请号:DE112010003461

    申请日:2010-07-12

    Applicant: IBM

    Abstract: Vorrichtung zur Extraktion von Sprachmerkmalen, wobei die Vorrichtung Folgendes umfasst:eine erste Differenzberechnungseinheit (600, 700, 800) zum Empfangen eines Spektrums für jede einer Mehrzahl von Frequenzgruppen eines Sprachsignals, wobei das Sprachsignal für jede Frequenzgruppe in Rahmen segmentiert ist, und zum Berechnen, für jeden Rahmen jeder Frequenzgruppe, einer Differenz des Spektrums zwischen fortlaufenden Rahmen für die Frequenzgruppe als ein Delta-Spektrum; undeine erste Normierungseinheit (605, 710, 810) zum Ausführen einer Normierung des Delta-Spektrums für jeden Rahmen jeder Frequenzgruppe durch Dividieren des Delta-Spektrums durch eine Funktion des mittleren Spektrums, welches durch einen Mittelwert von Spektren über alle Sprache darstellenden Rahmen gegeben ist.

    SYSTEM, PROGRAM, AND CONTROL METHOD FOR SPEECH SYNTHESIS

    公开(公告)号:CA2614840A1

    公开(公告)日:2007-01-18

    申请号:CA2614840

    申请日:2006-07-10

    Applicant: IBM

    Abstract: The present invention relates to the provision of natural-soundingphonemes and accents for text. There is provided a system that outputs phonemes and accents of texts.The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.

    VOICE RECOGNITION APPARATUS
    7.
    发明专利

    公开(公告)号:CA1336458C

    公开(公告)日:1995-07-25

    申请号:CA612649

    申请日:1989-09-22

    Applicant: IBM

    Abstract: The invention independently vector-quantizes the spectrum representing the static feature of speech on the frequency axis and the variation pattern of the spectrum on the time axis. The resultant pair of label trains are evaluated, based on the knowledge that there is a small correlation between them, by the equation: P(La, Lc?W) = P(La, Lc?I,W)P(I?W) I = P(La(1)?Ma(i1)P(Lc(1)?Mc(i1)) I P(Bi1,i2?Ma(i1),?Mc(i1)) P(La(2)?Ma(i2))P(Lc(2)?M(i2)) P(Bi2,i3 ?Ma(i2), Mc(i2) ...La(T)?Ma(iT))P(Lc(T)?Mc(iT)) P(BiT, iT+1?Ma(it), Mc(iT)) wherein W designated a Markov model representing a word; I = i1, I2, I3, ... iT, a state train; Ma and Mc, Markov models by label corresponding to the spectrum and the spectrum variation, respectively; and B , a transition from the state i to the scale j. P(La, Lc?W) is calculated for each Markov model W representing a word and W giving the maximum value for it is determined as the recognition result.

    Voice recognition system and method
    8.
    发明专利
    Voice recognition system and method 有权
    语音识别系统和方法

    公开(公告)号:JP2010139963A

    公开(公告)日:2010-06-24

    申请号:JP2008318403

    申请日:2008-12-15

    Abstract: PROBLEM TO BE SOLVED: To provide a practical system etc. for voice recognition, in which recognition performance is improved by considering utterance variation.
    SOLUTION: The system includes a voice recognition device 200 and a pre-processor 100 for creating a recognition graph used for voice recognition processing by the voice recognition device 200. The pre-processor 100 comprises: a language model estimation section 110 for estimating a language model; a recognition word dictionary section 130 holding corresponding information to a word, a phoneme string just in the same description as in the word, and to information on the phoneme string in which utterance variation is described; and a recognition graph creating section 140 for creating a recognition graph on the basis of a language model estimated by a language model estimation section 110, and the correspondence information held by the recognition word dictionary section 130 regarding the word included in the language model. The recognition graph creating section 140 creates the recognition graph by applying the phoneme string considering utterance variation regarding the word with respect to the word included in a word string composed of more than a fixed number of words.
    COPYRIGHT: (C)2010,JPO&INPIT

    Abstract translation: 要解决的问题:提供语音识别的实用系统等,其中通过考虑话语变化来提高识别性能。 解决方案:该系统包括用于创建用于由语音识别装置200进行语音识别处理的识别图形的语音识别装置200和预处理器100.预处理器100包括:语言模型估计部分110,用于 估计语言模型; 将对应的信息保存到单词的识别词典部分130,与该单词相同的描述中的音素串,以及描述话音变化的音素串的信息; 以及用于基于由语言模型估计部分110估计的语言模型创建识别图形的识别图形创建部分140,以及由识别词典词典部分130保持的关于语言模型中包含的词语的对应信息。 识别图形创建部分140通过应用音素串来考虑与包含在由多于固定数量的单词组成的单词串中的单词相关的单词的发音变化来应用音素串来创建识别图。 版权所有(C)2010,JPO&INPIT

    Voice activity detection system, method and program
    9.
    发明专利
    Voice activity detection system, method and program 有权
    语音活动检测系统,方法和程序

    公开(公告)号:JP2009210617A

    公开(公告)日:2009-09-17

    申请号:JP2008050537

    申请日:2008-02-29

    CPC classification number: G10L25/93

    Abstract: PROBLEM TO BE SOLVED: To provide a highly accurate voice activity detection method in a low S/N environment.
    SOLUTION: The voice activity is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech included in the speech signal by using the long-term spectrum variation component feature, or a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.
    COPYRIGHT: (C)2009,JPO&INPIT

    Abstract translation: 要解决的问题:在低S / N环境中提供高精度的语音活动检测方法。 解决方案:通过从语音信号中提取长期频谱变化分量和谐波结构作为特征向量并且通过使用语音信号增加语音信号中包括的语音和非语音之间的特征向量的差异来执行语音活动 长期光谱变化分量特征,或长期光谱变化分量提取和谐波结构特征提取。 通过使用具有在语音信号中的话语的平均音素持续时间上的窗口长度的长期频谱变化分量,语音活动检测的正确率和准确率比常规方法得到改进。 语音活动检测系统和方法提供能够进行非常精确的语音活动检测的语音处理,自动语音识别和语音输出。 版权所有(C)2009,JPO&INPIT

    Technology for creating high quality synthesis voice
    10.
    发明专利
    Technology for creating high quality synthesis voice 审中-公开
    创造高品质合成语音技术

    公开(公告)号:JP2008185805A

    公开(公告)日:2008-08-14

    申请号:JP2007019433

    申请日:2007-01-30

    CPC classification number: G10L13/07

    Abstract: PROBLEM TO BE SOLVED: To efficiently create high quality synthesis voice by connecting a plurality of phonemes.
    SOLUTION: A system comprises: a phoneme storage section for storing a plurality of phoneme data; a synthesis section for creating a voice data which indicates synthesis voice of a text by reading and connecting a phoneme data corresponding to each phoneme, which indicates pronunciation of the input text, from the phoneme storage section; a calculation section for calculating an index value which indicates unnaturalness of the synthesis voice of the text, based on the voice data; a paraphrase storage section for storing a second notation which is paraphrasing of a first notation by relating it to each of the plurality of first notations; a replacing section for replacing the searched notation with the second notation corresponding to the first notation, by searching notation which corresponds to any of the first notation from the text; and a determination section in which the created voice data is output on condition that the calculated index value is smaller than a reference value, and in which the text is input to the synthesis section so that the voice data of the replaced text may be further created, on condition that the index value is the reference value or more.
    COPYRIGHT: (C)2008,JPO&INPIT

    Abstract translation: 要解决的问题:通过连接多个音素来有效地创建高质量的合成声音。 解决方案:系统包括:音素存储部分,用于存储多个音素数据; 合成部分,用于通过从音素存储部分读取和连接指示对应于每个音素的音素数据来指示文本的综合语音,该语音数据指示输入文本的发音; 计算部分,用于基于语音数据计算表示文本的合成语音的不自然度的指标值; 一个释义存储部分,用于存储通过将其与多个第一符号中的每一个相关联而将第一符号改写为第二符号; 替换部分,用与第一符号相对应的第二符号替换搜索到的符号,通过从文本中搜索对应于任何第一符号的符号; 以及确定部分,其中在所计算的索引值小于参考值的条件下输出创建的语音数据,并且其中文本被输入到合成部分,使得可以进一步创建替换的文本的语音数据 ,条件是指标值为参考值或更多。 版权所有(C)2008,JPO&INPIT

Patent Agency Ranking