SPEECH RECOGNITION METHOD
    21.
    发明专利

    公开(公告)号:CA1256562A

    公开(公告)日:1989-06-27

    申请号:CA528993

    申请日:1987-02-04

    Applicant: IBM

    Abstract: Speaker adaptation is provided which easily enables a person to use a Hidden Markov model type recognizer previously trained by other particular person or persons. During training process, parameters of Markov models are calculated iteratively for example using Forward-Backward algorithm. The adaptation comprises storing and utilizing the intermediate results or probabilistic frequences of the last iteration. During the adaptation process, parameters are calculated by interpolation of the weighted sum of the stored probabilistic frequences and the ones obtained using new training data.

    Method, device and program for objective voice extraction
    22.
    发明专利
    Method, device and program for objective voice extraction 有权
    用于目标语音提取的方法,设备和程序

    公开(公告)号:JP2011113044A

    公开(公告)日:2011-06-09

    申请号:JP2009271890

    申请日:2009-11-30

    CPC classification number: G10L25/78 G10L15/20 G10L21/028 G10L2021/02166

    Abstract: PROBLEM TO BE SOLVED: To provide technology for extracting objective voice by efficiently suppressing mixing of other voice than objective voice, in a plurality pieces of voice which come from different directions.
    SOLUTION: The objective voice is extracted by performing at least either gain adjustment processing and segmentation processing of an utterance section, on a voice signal obtained by each of first and second voice input units which are arranged with a predetermined distance apart, by using a weighted Cross-Power Spectrum Phase (CSP) coefficient which becomes a small value in a frequency band which is likely to be influenced by other voice than the objective voice.
    COPYRIGHT: (C)2011,JPO&INPIT

    Abstract translation: 要解决的问题:通过有效地抑制来自不同方向的多个声音中的其他声音的混合而不是客观语音来提取目标声音的技术。 解决方案:通过对由发声部分进行的增益调整处理和分段处理,对由第一和第二语音输入单元中的每一个以预定距离间隔排列而获得的语音信号进行至少一个提取,目标声音由 使用加权的跨功率谱相位(CSP)系数,该系数在可能受到客观声音的其他声音影响的频带中变成小值。 版权所有(C)2011,JPO&INPIT

    Speech synthesis system, program, and method
    23.
    发明专利
    Speech synthesis system, program, and method 有权
    语音合成系统,程序和方法

    公开(公告)号:JP2009063869A

    公开(公告)日:2009-03-26

    申请号:JP2007232395

    申请日:2007-09-07

    CPC classification number: G10L13/00 G10L13/07 G10L13/10

    Abstract: PROBLEM TO BE SOLVED: To synthesize with high sound quality when there are many phonemes by utilizing advantages in waveform connection type speech synthesis, and synthesize with accurate accent even with less phonemes. SOLUTION: Prosody achieving both of accuracy and high sound quality can be provided by two-pass search of phoneme search and search of a prosody correction amount. In a preferable embodiment, in regards to both of the two passes of phoneme selection and correction amount search, consistency of the prosody is evaluated by using a statistical model of a change amount of the prosody (inclination of a basic frequency) to secure the accurate accent. A prosody correction amount system, in which correction prosody cost is minimum, is searched in search of the prosody corrected amount. Thereby, a correction amount system, which can increase likelihood to the statistical model of the change amount and an absolute value of the prosody with the correction amount as small as possible, is searched. COPYRIGHT: (C)2009,JPO&INPIT

    Abstract translation: 要解决的问题:通过利用波形连接型语音合成的优点,当有许多音素时,以高音质合成,即使用较少的音素也能以精确的音调进行合成。

    解决方案:通过双向搜索音素搜索和搜索韵律校正量可以提供实现精度和高音质的韵律。 在优选实施例中,关于音素选择和校正量搜索的两次通过,通过使用韵律变化量(基本频率的倾斜)的统计模型来评估韵律的一致性,以确保准确 口音。 搜索校正韵律成本最小的韵律校正量系统,以搜索韵律校正量。 因此,可以搜索可以增加改变量的统计模型的可能性的校正量系统和具有尽可能小的校正量的韵律的绝对值。 版权所有(C)2009,JPO&INPIT

    SPEECH SYNTHESIZING METHOD AND SYSTEM THEREFOR

    公开(公告)号:JPH0895589A

    公开(公告)日:1996-04-12

    申请号:JP22666794

    申请日:1994-09-21

    Applicant: IBM JAPAN

    Abstract: PURPOSE: To provide a stable speech synthesis processing with the reduced tremble of pitch in a speech synthesizer system utilizing a pitch synchronized waveform superposition method. CONSTITUTION: A glottis closing point is made to be the reference point of superposition (pitch mark). Since the glottis closing point is stably and accurately extracted by using a dynamic wavelet transformation, a speech with few trembling and few gurgling is synthesized by its stability. By setting the reference point of superposition on another position from the reference point of waveform cutout, the softer cutout of a wave form is enabled. Extraction of the glottis closing point is performed by searching the local peak of dynamic wavelet transformation, however, preferably the threshold for searching the local peak of dynamic wavelet transformation is adaptively controlled every time when dynamic wavelet transformation is obtained.

    METHOD FOR CONSTITUTING SPEECH MODEL AND SPEECH RECOGNITION DEVICE

    公开(公告)号:JPH06110493A

    公开(公告)日:1994-04-22

    申请号:JP25930192

    申请日:1992-09-29

    Applicant: IBM JAPAN

    Abstract: PURPOSE:To provide the speech recognition device which efficiently represents various vocalization deformation with a statistical combination of a small number of kind of HMMs. CONSTITUTION:A feature extracting device 4 analyzes features of an input word to obtain a corresponding feature vector train or a label train by a labeling device 8. At every vocalization deformation candidate as a speech of a subword, a phonemic hidden Markov model is given N-gram relation (N: integer large than two) with a speech deformation candidate for a precedent subword in a word and held in a parameter table 18. The recognition device 16 applies the HMM for each speech deformation candidate on the basis of the N-gram relation corresponding to a description candidate word in a recognition object word vocalization dictionary 13, constitutes a speech model by connecting respective HMMs of respective speech deformation candidates in parallel between the subwords, and finds the probability that the constituted speech model outputs the label train or feature vector train of the speech-inputted word as to each candidate word, thereby outputting the candidate word corresponding to the speech model with the highest probability as a recognition result to a display device 19.

    SPEECH RECOGNIZING APPARATUS
    29.
    发明专利

    公开(公告)号:JPH05127692A

    公开(公告)日:1993-05-25

    申请号:JP27889691

    申请日:1991-10-01

    Applicant: IBM

    Abstract: PURPOSE: To make it possible to execute speech recognition utilizing signal processing data in addition to speech signal processing by a signal processing card packaged on a bus of a personal computer. CONSTITUTION: The signal processing card 5 loaded to the bus 2 of the personal computer 1 has a bus master 6 and a main memory 4 in the computer 1 is accessed by the bus master 6. A vast probability value table necessary for speech recognition is stored in the main memory 4, and in each arrival of a label to be processed, a necessary part of the table is read out from a memory in the card 5 by the DMA transfer of the bus master 6 and speech recognition processing is executed.

    System, Verfahren und Programm zur Entnahme eines themenfremden Teils aus einem Gespräch

    公开(公告)号:DE102012224488A1

    公开(公告)日:2013-07-18

    申请号:DE102012224488

    申请日:2012-12-28

    Applicant: IBM

    Abstract: Problem Eine Technologie zur Entnahme eines themenfremden Teils aus einem Gespräch bereitzustellen. Lösungsmittel Das System zur Entnahme eines themenfremden Teils zur Entnahme eines themenfremden Teils aus einem Gespräch beinhaltet: einen ersten Korpus, der Dokumente aus einer Vielzahl von Gebieten enthält; einen zweiten Korpus, der nur Dokumente aus einem Gebiet enthält, zu dem das Gespräch gehört; ein Ermittlungsmittel zur Ermittlung eines Wortes als ein Untergrenzengegenstandswort, für das der IDF-Wert für den ersten Korpus und der IDF-Wert für den zweiten Korpus jeweils unterhalb eines ersten bestimmten Schwellenwerts liegen; ein Anzahlberechnungsteil zur Berechnung eines TF-IDF-Wertes als Anzahl für jedes im vorgenannten zweiten Korpus enthaltene Wort, wobei der vorgenannte Anzahlberechnungsteil für das vorgenannte Untergrenzengegenstandswort eine konstante Einstellung einer Untergrenze anstelle eines TF-IDF-Wertes verwendet; ein Herausschneideteil zum sequenziellen Herausschneiden von der Verarbeitung unterzogenen Intervallen aus den Textdaten, die den Inhalt des vorgenannten Gesprächs darstellen; und ein Entnahmeteil zur Entnahme eines Intervalls, bei dem der Durchschnittswert der in dem vorgenannten herausgeschnittenen Intervall enthaltenen vorgenannten Anzahl von Wörtern größer ist als ein zweiter bestimmter Schwellenwert, als themenfremden Teil.

Patent Agency Ranking