Speech synthesis system, program, and method
    1.
    发明专利
    Speech synthesis system, program, and method 有权
    语音合成系统,程序和方法

    公开(公告)号:JP2009063869A

    公开(公告)日:2009-03-26

    申请号:JP2007232395

    申请日:2007-09-07

    CPC classification number: G10L13/00 G10L13/07 G10L13/10

    Abstract: PROBLEM TO BE SOLVED: To synthesize with high sound quality when there are many phonemes by utilizing advantages in waveform connection type speech synthesis, and synthesize with accurate accent even with less phonemes. SOLUTION: Prosody achieving both of accuracy and high sound quality can be provided by two-pass search of phoneme search and search of a prosody correction amount. In a preferable embodiment, in regards to both of the two passes of phoneme selection and correction amount search, consistency of the prosody is evaluated by using a statistical model of a change amount of the prosody (inclination of a basic frequency) to secure the accurate accent. A prosody correction amount system, in which correction prosody cost is minimum, is searched in search of the prosody corrected amount. Thereby, a correction amount system, which can increase likelihood to the statistical model of the change amount and an absolute value of the prosody with the correction amount as small as possible, is searched. COPYRIGHT: (C)2009,JPO&INPIT

    Abstract translation: 要解决的问题:通过利用波形连接型语音合成的优点,当有许多音素时,以高音质合成,即使用较少的音素也能以精确的音调进行合成。

    解决方案:通过双向搜索音素搜索和搜索韵律校正量可以提供实现精度和高音质的韵律。 在优选实施例中,关于音素选择和校正量搜索的两次通过,通过使用韵律变化量(基本频率的倾斜)的统计模型来评估韵律的一致性,以确保准确 口音。 搜索校正韵律成本最小的韵律校正量系统,以搜索韵律校正量。 因此,可以搜索可以增加改变量的统计模型的可能性的校正量系统和具有尽可能小的校正量的韵律的绝对值。 版权所有(C)2009,JPO&INPIT

    Speech recognition and synthesis system, program and method
    2.
    发明专利
    Speech recognition and synthesis system, program and method 有权
    语音识别和合成系统,程序和方法

    公开(公告)号:JP2009282330A

    公开(公告)日:2009-12-03

    申请号:JP2008134759

    申请日:2008-05-22

    Abstract: PROBLEM TO BE SOLVED: To provide a method, a means and a program for high accuracy speech recognition and naturally synthesized speech, output in a language having large variations in the speech tone. SOLUTION: A statistic model is learned, by observing F0 tilt by using a linear approximation method or a global smoothing method, of F0 of a start point and an end point of a phoneme, and the F0 tilt is evaluated in runtime, and synthesis speech in which the F0 is corrected, based on cost calculation is output. Time change of the F0 tilt in a syllable is modeled, by learning a decision tree for each region into which the syllable is suitably and equally divided. Likelihood is evaluated by estimating an error range in the observed F0 tilt. By linking these operations, high-accuracy speech recognition and natural tone synthesis speech output are obtained. COPYRIGHT: (C)2010,JPO&INPIT

    Abstract translation: 要解决的问题:提供用于高精度语音识别和自然合成语音的方法,装置和程序,以具有较大变化的语言语言输出。

    解决方案:通过使用线性近似法或全局平滑方法,通过观察起始点和音素的终点的F0的F0倾斜来学习统计模型,并且在运行时评估F0倾斜度, 并输出基于成本计算校正F0的合成语音。 音节F0倾斜的时间变化是通过学习一个决策树,为每个区域进行适当和均等分割的音节。 通过估计观察到的F0倾斜的误差范围来评估似然。 通过链接这些操作,获得高精度语音识别和自然音合成语音输出。 版权所有(C)2010,JPO&INPIT

    METHOD AND SYSTEM OF ELECTRONIC WATERMARK OF COMPRESSED AUDIO DATA

    公开(公告)号:JP2001184080A

    公开(公告)日:2001-07-06

    申请号:JP36462799

    申请日:1999-12-22

    Applicant: IBM

    Abstract: PROBLEM TO BE SOLVED: To provide a method and a system for directly operating information in compressed digital audio data. SOLUTION: A system which embeds additional information in compressed audio data has (1) a means for restoring a MDCT(modified Discrete Cosine Transform) coefficient from the compressed audio data, (2) a means which finds frequency components of the audio data by using the restored MDCT coefficient, (3) a means for embedding the additional information in the found frequency components in a frequency space, and (5) a means for generating compressed audio data from the MDCT coefficient embedded in the additional information.

    Technology for creating high quality synthesis voice
    4.
    发明专利
    Technology for creating high quality synthesis voice 审中-公开
    创造高品质合成语音技术

    公开(公告)号:JP2008185805A

    公开(公告)日:2008-08-14

    申请号:JP2007019433

    申请日:2007-01-30

    CPC classification number: G10L13/07

    Abstract: PROBLEM TO BE SOLVED: To efficiently create high quality synthesis voice by connecting a plurality of phonemes.
    SOLUTION: A system comprises: a phoneme storage section for storing a plurality of phoneme data; a synthesis section for creating a voice data which indicates synthesis voice of a text by reading and connecting a phoneme data corresponding to each phoneme, which indicates pronunciation of the input text, from the phoneme storage section; a calculation section for calculating an index value which indicates unnaturalness of the synthesis voice of the text, based on the voice data; a paraphrase storage section for storing a second notation which is paraphrasing of a first notation by relating it to each of the plurality of first notations; a replacing section for replacing the searched notation with the second notation corresponding to the first notation, by searching notation which corresponds to any of the first notation from the text; and a determination section in which the created voice data is output on condition that the calculated index value is smaller than a reference value, and in which the text is input to the synthesis section so that the voice data of the replaced text may be further created, on condition that the index value is the reference value or more.
    COPYRIGHT: (C)2008,JPO&INPIT

    Abstract translation: 要解决的问题:通过连接多个音素来有效地创建高质量的合成声音。 解决方案:系统包括:音素存储部分,用于存储多个音素数据; 合成部分,用于通过从音素存储部分读取和连接指示对应于每个音素的音素数据来指示文本的综合语音,该语音数据指示输入文本的发音; 计算部分,用于基于语音数据计算表示文本的合成语音的不自然度的指标值; 一个释义存储部分,用于存储通过将其与多个第一符号中的每一个相关联而将第一符号改写为第二符号; 替换部分,用与第一符号相对应的第二符号替换搜索到的符号,通过从文本中搜索对应于任何第一符号的符号; 以及确定部分,其中在所计算的索引值小于参考值的条件下输出创建的语音数据,并且其中文本被输入到合成部分,使得可以进一步创建替换的文本的语音数据 ,条件是指标值为参考值或更多。 版权所有(C)2008,JPO&INPIT

    System supporting text-to-speech synthesis
    5.
    发明专利
    System supporting text-to-speech synthesis 有权
    系统支持文本语音合成

    公开(公告)号:JP2008046538A

    公开(公告)日:2008-02-28

    申请号:JP2006224110

    申请日:2006-08-21

    CPC classification number: G10L13/04 G10L15/26

    Abstract: PROBLEM TO BE SOLVED: To provide a system for supporting text-to-speech synthesis, capable of efficiently generating high-quality synthetic speech. SOLUTION: The system supports text-to-speech synthesis, and comprises a learning data generating part which recognizes input speech and generates a 1st learning data, to make correspondence between notation and reading of a phrase; a frequency data generating part which creates a frequency data showing the appearance frequency of the notation and reading of the phrase, based on the 1st learning data; and a setting part which sets the frequency data to a language processing part which generates reading method, corresponding to the notation from the text notation, based on the appearance frequency of the reading so as to bring output speech of the text-to-speech synthesis that should be made close to the input speech. COPYRIGHT: (C)2008,JPO&INPIT

    Abstract translation: 要解决的问题:提供一种能够有效地产生高质量合成语音的支持文本到语音合成的系统。 解决方案:该系统支持文本到语音合成,并且包括识别输入语音并产生第一学习数据的学习数据生成部分,以使符号和阅读短语之间的对应关系; 频率数据生成部,其基于所述第1学习数据,生成表示所述符号的出现频度和所述短语的读取的频率数据; 以及设定部,其基于读取的出现频度,将频率数据设定为与文本符号对应的符号生成读取方式的语言处理部,将文字对语音合成的输出语音 应该接近输入的言论。 版权所有(C)2008,JPO&INPIT

    Information processor, information processing method, information processing system and program
    7.
    发明专利
    Information processor, information processing method, information processing system and program 有权
    信息处理器,信息处理方法,信息处理系统和程序

    公开(公告)号:JP2012159596A

    公开(公告)日:2012-08-23

    申请号:JP2011017986

    申请日:2011-01-31

    CPC classification number: G10L15/1807 G10L15/05

    Abstract: PROBLEM TO BE SOLVED: To provide an information processor, an information processing method, an information processing system and a program for analyzing a phrase reflecting information that is not recognized explicitly with words.SOLUTION: An information processor 120 uses voice data recording dialogs to identify information that is not clearly specified with words in the voice data, and comprises: an acoustic analysis unit 208 for execute acoustic analysis of the voice data by using acoustic data; a prosodic information acquisition unit 212 for identifying a region isolated before and after the voice data by a pause, identifying a phrase in the identified region by using the acoustic analysis of the identified region, and generating one or more prosodic feature values with respect to the phrase with setting a prosodic feature value of the phrase as an element; an appearance frequency acquisition unit 210 for acquiring an appearance frequency of the phrase, which is acquired by the acoustic analysis unit 208, in the voice data; and a prosodic variation analysis unit 214 for calculating a variation degree of the prosodic feature value of the phrase with high appearance frequency in the voice data, and determining a feature phrase.

    Abstract translation: 要解决的问题:提供一种信息处理器,信息处理方法,信息处理系统和用于分析反映不能用单词明确识别的信息的短语的程序。 解决方案:信息处理器120使用语音数据记录对话来识别语音数据中没有用字清楚指定的信息,并且包括:声学分析单元208,用于通过使用声学数据来执行语音数据的声学分析; 韵律信息获取单元212,用于通过暂停识别语音数据之前和之后隔离的区域,通过使用所识别的区域的声学分析识别所识别的区域中的短语,并且生成关于所述语音数据的一个或多个韵律特征值 将短语的韵律特征值设置为元素; 出现频率获取单元210,用于获取声音数据中由声学分析单元208获取的短语的出现频率; 以及韵律变异分析单元214,用于计算语音数据中具有高出现频率的短语的韵律特征值的变化程度,以及确定特征短语。 版权所有(C)2012,JPO&INPIT

    Technique for recognizing accent of input voice
    8.
    发明专利
    Technique for recognizing accent of input voice 审中-公开
    识别输入语音的技巧

    公开(公告)号:JP2008134475A

    公开(公告)日:2008-06-12

    申请号:JP2006320890

    申请日:2006-11-28

    CPC classification number: G10L15/04 G10L13/04

    Abstract: PROBLEM TO BE SOLVED: To efficiently and accurately recognize accent of input voice. SOLUTION: Notation data for learning showing notation of each phrase of a text for learning, utterance data for learning showing characteristics of utterance of each phrase, and boundary data for learning showing whether or not each phrase is the boundary of an accent phrase, are stored. The candidate of the boundary data is input, and first likelihood in which the boundary of the accent phrase of each phrase of the input text is coincident with the input candidate, is calculated from input notation data showing notation of the input text for showing the content of the input voice, the notation data for learning, and the boundary data for learning. Second likelihood in which utterance of each phrase of the input text becomes utterance indicated by input utterance data, when the input voice has the boundary of the accent phrase indicated by the candidate of the candidate data, from input utterance data showing characteristics of the utterance of each phrase of the input voice, the utterance data for learning, and the boundary data for learning. The candidate of the boundary data which maximizes a product of the first likelihood and the second likelihood, is searched and the result is output. COPYRIGHT: (C)2008,JPO&INPIT

    Abstract translation: 要解决的问题:高效准确地识别输入声音的重音。

    解决方案:用于学习的记号数据显示用于学习的文本的每个短语的符号,用于学习的学习数据,显示每个短语的发音特征,以及用于学习的边界数据,显示每个短语是否是重音短语的边界 ,被存储。 输入边界数据的候选者,并且输入文本的每个短语的重音短语的边界与输入候选符合的第一可能性是从输入符号数据计算的,该输入符号数据显示用于显示内容的输入文本的符号 的输入声音,用于学习的符号数据和用于学习的边界数据。 当输入语音具有由候选数据的候选者指示的重音短语的边界时,由输入语音数据表示输入文本的每个短语的发音变成话语的第二可能性, 输入声音的每个短语,用于学习的话语数据和用于学习的边界数据。 搜索最大化第一似然率和第二似然度的乘积的边界数据的候选,并输出结果。 版权所有(C)2008,JPO&INPIT

    Device for detecting digital watermark, method therefor and program
    9.
    发明专利
    Device for detecting digital watermark, method therefor and program 有权
    用于检测数字水印,其方法和程序的设备

    公开(公告)号:JP2005284085A

    公开(公告)日:2005-10-13

    申请号:JP2004099592

    申请日:2004-03-30

    CPC classification number: G10L19/097

    Abstract: PROBLEM TO BE SOLVED: To properly detect a digital watermark by improving ruggedness of the digital watermark embedded in processed variously voice contents. SOLUTION: This device is provided with watermark signal detection parts 11 for calculating detected values of a watermark signal by using two or more keys to a PCM data of voice contents for each channel, an adding part 12 of two or more detected values for adding detected values corresponding to each channel and each key at each possible combination of each channel and each key, and a comparison selection part for selecting and outputting one addition result from among each addition result by the adding part 12 of two or more detected values. Moreover, this device is provided with a message reconstruction part 13 which accumulates these detected values at different accumulation cycles and reconstructs a message embedded as a digital watermark from the accumulated detected values and also performs boundary detection of the voice contents, to detect the voice contents in which the digital watermark is embedded, and a detection result output part 14 which synthesizes each result processed by the message reconstruction part 13 and outputs the result. COPYRIGHT: (C)2006,JPO&NCIPI

    Abstract translation: 要解决的问题:通过改善嵌入在处理的各种语音内容中的数字水印的坚固性来适当地检测数字水印。 解决方案:该装置具有水印信号检测部分11,用于通过使用两个或更多个键来计算每个通道的语音内容的PCM数据的水印信号的检测值,两个或更多个检测值的加法部分12 用于在每个通道和每个键的每个可能组合处添加对应于每个通道和每个键的检测值;以及比较选择部分,用于从加法部分12中选择并输出一个相加结果,其中两个或更多个检测值 。 此外,该装置设置有消息重建部分13,其在不同的累加周期累积这些检测值,并从累积的检测值重建作为数字水印嵌入的消息,并且还执行语音内容的边界检测,以检测语音内容 其中嵌入有数字水印,以及检测结果输出部分14,其合成由消息重构部分13处理的每个结果,并输出结果。 版权所有(C)2006,JPO&NCIPI

Patent Agency Ranking