31.
    发明专利
    未知

    公开(公告)号:BRPI0614034A2

    公开(公告)日:2011-03-01

    申请号:BRPI0614034

    申请日:2006-07-10

    Applicant: IBM

    Abstract: A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.

    33.
    发明专利
    未知

    公开(公告)号:DE69010722T2

    公开(公告)日:1995-03-16

    申请号:DE69010722

    申请日:1990-03-07

    Applicant: IBM

    Abstract: The present invention relates to a speech recognition system comprising means (4) for performing a frequency analysis of an input speech in a succession of time periods to obtain feature vectors, means (8) for producing a corresponding label train using a vector quantization code book (9), means (11) for matching a plurality of word baseforms, expressed by a train of Markov models each corresponding to labels, with said label train, means (14) for recognizing the input speech on the basis of the matching result, and means (5, 6, 7, 9) for performing an adaptation operation on the said system to improve its ability to recognise speech. According to the invention, the speech recognition system is characterised in that means for performing an adaptation operation comprises means (4) for dividing each of a plurality of input speech words into N segments (N is an integer number more than 1) and producing a representative value of the feature vector of each segment of each input speech word a means for dividing into segments word baseforms each corresponding to one of said input speech words and for producing a representative value of each segment feature vector of each word baseform on the basis of a prototype vector of the vector quantization code book, means for producing a distance vector indicating the distance between a representative value of each segment of each input speech word and a representative value of the corresponding segment of the corresponding word baseform, means for storing the degree of relation between each segment of each input speech word and each label in a label group of the vector quantization code book; and prototype adaptation means for correcting a prototype vector of each label in the label group of the vector quantization code book by each displacement vector in accordance with the degree of relation between the label and the displacement vector.

    34.
    发明专利
    未知

    公开(公告)号:DE3773039D1

    公开(公告)日:1991-10-24

    申请号:DE3773039

    申请日:1987-03-25

    Applicant: IBM

    Abstract: The present invention relates to a speech recognition system of the type comprising a plurality of probabilistic finite state models each having a number of states and the ability of undergoing transitions from one state to another and producing a corresponding output representing a speech element, and means for defining for each model probability parameters each representing the probability that the model will undergo a transition from one predetermined state to another predetermined state and produce a corresponding output. Such a system can be used to recognise input speech data by initially dividing the input speech data into individual speech elements (4, 5, 6) and then applying the input speech elements to the models, and utilising the probability parameters of the models to recognise the input speech elements. According to the invention the speech recognition system is characterised in that it comprises training means (8) for supplying training speech data to the models in order to train the models and to define initial values for the probability parameters for each of the models, and adaptation means (9) for supplying adaptation speech data to the models in order to adapt the models and to define adapted values of the probability parameters for each of the models. The adapted values of the probability parameters are used to recognise the input speech elements (10).

    Speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program
    35.
    发明专利
    Speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program 有权
    语音特征提取装置,语音提取方法和语音特征提取程序(SPEECH FEATURE EXTRACTION PROGRAM

    公开(公告)号:JP2013178575A

    公开(公告)日:2013-09-09

    申请号:JP2013109608

    申请日:2013-05-24

    CPC classification number: G10L15/02 G10L15/20 G10L25/24

    Abstract: PROBLEM TO BE SOLVED: To provide a technique for extracting features even more robust to reverberations, noises, and the like from a speech signal.SOLUTION: A speech feature extraction apparatus is configured to: receive, as an input, values obtained by adding a spectrum of each frame of a speech signal segmented into frames to an average spectrum that is the average of spectra over all frames that are overall speech; and, for each frame, multiply said values by weights of a mel filter bank to sum up the products, apply the discrete cosine transform to the logarithm of the sum, and calculate, and define as a delta feature, the difference in the discrete cosine transform between former and later frames.

    Abstract translation: 要解决的问题:提供一种用于从语音信号中提取对于混响,噪声等更加鲁棒的特征的技术。解决方案:语音特征提取装置被配置为:作为输入接收通过添加 将分割成帧的语音信号的每个帧的频谱分解为平均频谱,该平均频谱是作为整个语音的所有帧的频谱的平均值; 并且,对于每个帧,将所述值乘以呃滤波器组的权重以对产物进行求和,将离散余弦变换应用于和的对数,并计算并定义为离散余弦差的Δ特征 在前后帧之间进行转换。

    Idle talk extraction system, method and program for extracting idle talk parts from conversation
    36.
    发明专利
    Idle talk extraction system, method and program for extracting idle talk parts from conversation 有权
    空闲提拉系统,从对话中提取空闲零件的方法和程序

    公开(公告)号:JP2013145429A

    公开(公告)日:2013-07-25

    申请号:JP2012004802

    申请日:2012-01-13

    CPC classification number: G06F17/3053 G06F17/2785 Y10S707/99933

    Abstract: PROBLEM TO BE SOLVED: To provide a technique for extracting idle talk parts from a conversation.SOLUTION: An idle talk extraction system for extracting idle talks from a conversation comprises: a first corpus including documents in a plurality of fields; a second corpus including only documents in a field to which the conversation belongs; a determination part to determine as a lower limit subject word a word for which an idf value for the first corpus and an idf value for the second corpus are each below a first prescribed threshold value, for words included in the second corpus; a score calculation part to calculate as a score a tf-idf value for each word included in the second corpus and, for the lower limit subject word, use a constant set as a lower limit instead of the tf-idf value; a clipping part to sequentially cut out intervals to be processed, from text data of contents of the conversation; and an extraction part to extract as an idle talk part an interval where an average value of the score of words included in the interval is larger than a second prescribed threshold value.

    Abstract translation: 要解决的问题:提供从会话中提取空闲谈话部分的技术。解决方案:一种用于从会话中提取空闲会话的空闲谈话提取系统包括:包括多个字段中的文档的第一语料库; 第二语料库,仅包括会话所属领域的文件; 确定部分,用于将包含在第二语料库中的单词确定为第一语料库的idf值和第二语料库的idf值的单词低于第一规定阈值的下限主题词语的单词; 分数计算部分,用于计算包括在第二语料库中的每个单词的tf-idf值作为分数,并且对于下限主题词,使用常数集作为下限而不是tf-idf值; 剪切部分,从会话的内容的文本数据中顺序地切出待处理的间隔; 以及提取部分,作为空闲谈话部分提取包括在所述间隔中的词的分数的平均值大于第二规定阈值的间隔。

    Device, method and program for detecting ingressive in voice
    37.
    发明专利
    Device, method and program for detecting ingressive in voice 有权
    用于检测语音的设备,方法和程序

    公开(公告)号:JP2012032557A

    公开(公告)日:2012-02-16

    申请号:JP2010171278

    申请日:2010-07-30

    Abstract: PROBLEM TO BE SOLVED: To provide a technology capable of detecting an ingressive in a voice signal with a high detection rate and a high degree of accuracy.SOLUTION: An ingressive detection device refers to each acoustic model of ingressive and non-ingressive for determining an ingressive candidate and generates a feature vector with setting simplex information meaning information on ingressive candidate simplex, and context information as an element. The context information means information on a relation between the ingressive candidate and a speech section including the ingressive candidate, a relation between the ingressive candidate and an ingressive candidate before and after the ingressive candidate or both relations. The ingressive detection device obtains classification reference information for classifying the ingressive candidate into either the ingressive or the non-ingressive, through machine learning with setting the feature vector as input, and classifies the ingressive candidate into either the ingressive or the non-ingressive based on the classification reference information.

    Abstract translation: 要解决的问题:提供能够以高检测率和高精度检测语音信号中的入侵的技术。 入侵检测装置是指入侵性和非侵入性的每个声学模型,用于确定入侵候选,并且生成特征向量,其中设置单数信息意味着入侵候选单形的信息和上下文信息作为元素。 上下文信息是指关于入侵候选者和包括入境候选人的语音部分之间的关​​系的信息,入境候选人之间和入侵候选人之间的关系以及两者之间的关系。 入侵检测装置通过设置特征向量作为输入,通过机器学习获得入侵候选分类为入侵或非入侵的分类参考信息,并将入侵候选分类为入侵或非进入基于 分类参考信息。 版权所有(C)2012,JPO&INPIT

    System, method and program for processing voice data of dialogue between two persons
    38.
    发明专利
    System, method and program for processing voice data of dialogue between two persons 有权
    用于处理两人对话语音数据的系统,方法和程序

    公开(公告)号:JP2009216840A

    公开(公告)日:2009-09-24

    申请号:JP2008058745

    申请日:2008-03-07

    CPC classification number: G10L25/78 G10L17/00 H04M3/5175

    Abstract: PROBLEM TO BE SOLVED: To appropriately process voice data of dialogue between two persons.
    SOLUTION: A system for processing voice data of dialogue between two persons comprises: a first transition calculating section for calculating transition of an utterance rate of a first person from the voice data of dialogue between two persons; a second transition calculating section for calculating transition of an utterance rate of a second person from the voice data of dialogue between two persons; a difference calculating section for calculating a difference data sequence for expressing transition of difference between the utterance rate of the first speaker and the utterance rate of the second speaker; a smoothing section for creating a smooth difference data sequence in which the difference data sequence is smoothed; and a presentation section for presenting transition of the utterance rates of the first speaker and the second speaker, which is expressed by using the smooth difference data sequence.
    COPYRIGHT: (C)2009,JPO&INPIT

    Abstract translation: 要解决的问题:适当处理两人之间对话的语音数据。 解决方案:一种用于处理两人之间对话的语音数据的系统,包括:第一过渡计算部分,用于从两人之间的对话的语音数据中计算第一人的发话率的转换; 第二过渡计算部分,用于从两个人之间的对话的语音数据计算第二人的话语速率的转换; 差分计算部分,用于计算用于表示第一扬声器的发音速率与第二扬声器的发音速率之间的差的转变的差分数据序列; 平滑部分,用于创建平滑差分数据序列,其中差分数据序列被平滑化; 以及呈现部分,用于呈现通过使用平滑差分数据序列表达的第一说话者和第二说话者的发话率的转变。 版权所有(C)2009,JPO&INPIT

    Object sound extraction method by removing noise, preprocessing section, voice recognition system and program
    39.
    发明专利
    Object sound extraction method by removing noise, preprocessing section, voice recognition system and program 有权
    通过移除噪声,预处理部分,语音识别系统和程序的对象声音提取方法

    公开(公告)号:JP2008275881A

    公开(公告)日:2008-11-13

    申请号:JP2007119194

    申请日:2007-04-27

    Abstract: PROBLEM TO BE SOLVED: To extract only voice of a target person under noise environment, without requiring a large scale microphone array and a reference signal of noise.
    SOLUTION: An object sound extraction method is disclosed in which a practical speech recognition performance is actualized only by performing gain adjustment between spectrum subtraction (SS) processing and flooring processing, as processing for two channel input speech which is obtained from the microphones 1 and 2 etc. As the gain adjustment, a CSP (Cross-power Spectrum Phase) coefficient, which is cross-correlation between two channel signals, can be utilized. In an indoor environment including a vehicle where audio background sound etc., a recognition rate of a voice command in a car navigation system is improved, then, usability of a speaker such as a driver is improved.
    COPYRIGHT: (C)2009,JPO&INPIT

    Abstract translation: 要解决的问题:在噪声环境下仅提取目标人员的声音,而不需要大规模的麦克风阵列和噪声的参考信号。 解决方案:公开了一种对象声音提取方法,其中仅通过在频谱减法(SS)处理和地板处理之间进行增益调整来实现实际语音识别性能,作为从麦克风获得的两声道输入语音的处理 1和2等。作为增益调整,可以使用在两个信道信号之间互相关的CSP(跨功率谱相位)系数。 在包括音响背景音等的车辆的室内环境中,提高了汽车导航系统中的语音命令的识别率,因此提高了诸如驾驶员的扬声器的可用性。 版权所有(C)2009,JPO&INPIT

    Technique for acquiring character string or the like to be newly recognized as phrase
    40.
    发明专利
    Technique for acquiring character string or the like to be newly recognized as phrase 有权
    获取字符串的技术或类似于新闻识别的技术

    公开(公告)号:JP2008216756A

    公开(公告)日:2008-09-18

    申请号:JP2007055522

    申请日:2007-03-06

    CPC classification number: G10L15/063

    Abstract: PROBLEM TO BE SOLVED: To acquire a characteristic to be recognized as a phrase and its pronunciation more accurately than before. SOLUTION: A system selects a plurality of candidate character strings as candidates to be recognized as phrases from an input text, combines predetermined pronunciations with respective characters included in each of the selected candidate character strings to generate a plurality of candidates for pronunciations of the candidate character string, combines data wherein the respective generated candidates for the pronunciations are made to correspond to respective candidate character strings with language model data wherein numerals indicative of frequencies of appearance of the respective phrases in the text are recorded to generate frequency data indicative of frequencies of appearance by pairs of character strings representing the phrases and pronunciations, speech-recognizes an input speech based upon the generated frequency data to generate recognition data wherein character strings indicative of a plurality of phrases included in the input speech are made to correspond to pronunciations, and selects and outputs a combination included in the recognition data among combinations of candidate character strings and candidates for pronunciations. COPYRIGHT: (C)2008,JPO&INPIT

    Abstract translation: 要解决的问题:比以前更准确地获取被识别为短语及其发音的特征。 解决方案:系统选择多个候选字符串作为从输入文本中识别为短语的候选,将预定发音与包括在每个所选择的候选字符串中的各个字符相结合,以产生多个候选字符,用于发音的发音 候选字符串组合数据,其中使得发音的各个生成的候选对应于具有语言模型数据的各个候选字符串,其中记录了表示文本中各个短语的出现频率的数字,以产生指示 表示短语和发音的字符串对的出现频率,语音 - 基于生成的频率数据识别输入语音,以产生识别数据,其中指示包括在输入语音中的多个短语的字符串被做成对应于pronu 并且选择并输出包括在候选字符串的组合中的识别数据中的组合和用于发音的候选。 版权所有(C)2008,JPO&INPIT

Patent Agency Ranking