-
公开(公告)号:KR1020090041923A
公开(公告)日:2009-04-29
申请号:KR1020070107705
申请日:2007-10-25
Applicant: 한국전자통신연구원
IPC: G10L15/06 , G10L15/02 , G10L15/183 , G10L15/197
Abstract: A voice recognition method is provided to model various textual language phenomenons into statistical modeling among various knowledge sources. A morpheme is interpreted for a primitive text language corpus consisting of the separate words of Korean(S201). A morpheme language corpus separated is a separate word generated to morpheme. A word trigram which is the language model consisting of a morpheme unigram about a generated morpheme language corpus as described above, and bigram and trigrams is generated(S202). A first N - best recognition candidate to the maximum N is generated for a voice(S204). Recognition result candidates applying a morph-syntactic constraints are revaluated(S205). A second N-best list generated in above step is revaluated(S206). A final N-best list is generated.
Abstract translation: 提供语音识别方法,将各种文本语言现象建模成各种知识源之间的统计建模。 语素被解释为由韩语单词组成的原始文本语言语料库(S201)。 分离语素语言语料是一个单独的语素词。 生成由上述生成的语素语言语料库的词素单词组成的语言模型的单词trigram,并且生成二进制和三元组(S202)。 为语音产生最大N的第N个最佳识别候选(S204)。 重新评估应用变形语法约束的识别结果候选(S205)。 在上述步骤中生成的第二个N最佳列表被重新评估(S206)。 生成最终的N最佳列表。
-
公开(公告)号:KR1020120045582A
公开(公告)日:2012-05-09
申请号:KR1020100107205
申请日:2010-10-29
Applicant: 한국전자통신연구원
IPC: G10L15/14
CPC classification number: G10L15/144 , G10L15/285
Abstract: PURPOSE: A sound model generating apparatus and a method thereof are provided to automatically search for a penalty value about complexity of a sound model of an MDL(Minimum Description Length) standard. CONSTITUTION: A binary tree generating unit(101) generates a binary tree by repetition of Gaussian components in an HMM(Hidden Markov Model) state based on distance standards. An information generating unit(102) generates the maximum scale information of the sound model according to a platform(111) including a sound recognition unit(112). A binary tree reduction unit(103) reduces the binary tree according to the maximum scale information of the sound model.
Abstract translation: 目的:提供一种声音模型生成装置及其方法,以自动搜索关于MDL(最小描述长度)标准的声音模型的复杂度的惩罚值。 构成:二叉树生成单元(101)通过基于距离标准在HMM(隐马尔可夫模型)状态中重复高斯分量来生成二叉树。 信息生成单元(102)根据包括声音识别单元(112)的平台(111)生成声音模型的最大比例信息。 二叉树缩小单元(103)根据声音模型的最大比例信息来减少二叉树。
-
公开(公告)号:KR1020100067727A
公开(公告)日:2010-06-22
申请号:KR1020080126244
申请日:2008-12-12
Applicant: 한국전자통신연구원
Abstract: PURPOSE: A voice recognition unit and a method thereof of a multiple search base for performing a multi-search about the input speech signal of the multiple search base are provided to improve voice recognition performance about the voice signal by using FSN(Finite State Network) mode and N-gram mode. CONSTITUTION: A speech feature extracting block(102) extracts feature data about the inputted voice signal. A language model database(108) stores the FSN language model and N-gram language model. A multi-search block(104) is parallel performed the first voice search and the second voice search. The multiple search block is created in the integration search network. The multiple search block outputs the voice recognition result according to the third voice search.
Abstract translation: 目的:提供一种用于执行关于多个搜索库的输入语音信号的多次搜索的多重搜索基的语音识别单元及其方法,以通过使用FSN(有限状态网络)来改善关于语音信号的语音识别性能, 模式和N-gram模式。 构成:语音特征提取块(102)提取关于输入的语音信号的特征数据。 语言模型数据库(108)存储FSN语言模型和N-gram语言模型。 多搜索块(104)并行执行第一语音搜索和第二语音搜索。 多个搜索块在集成搜索网络中创建。 多重搜索块根据第三语音搜索输出语音识别结果。
-
公开(公告)号:KR1020100062825A
公开(公告)日:2010-06-10
申请号:KR1020090026451
申请日:2009-03-27
Applicant: 한국전자통신연구원
Abstract: PURPOSE: A voice synthesis device and a method thereof are provided, in which the parametric value of the voice section of low reliability among the composite tone of the voice recognition unit is readjusted automatically. CONSTITUTION: A text voice composition unit(200) outputs the first composite sound by the synthesis of the inputted text sentence. A voice recognition unit(202) performs the first voice recognition in the state that the neighboring noise is added to the first composite sound. The voice recognition unit readjusts the voice parameter value about the voice section where the voice recognition reliability value is lower than the fixation standard. A voice composition unit(204) outputs the second composite tone by changed voice parameter value and the recognized voice.
Abstract translation: 目的:提供一种语音合成装置及其方法,其中自动地重新调整语音识别单元的复合音调中的低可靠性的语音部分的参数值。 构成:文本语音合成单元(200)通过合成输入的文本句子来输出第一复合声音。 语音识别单元(202)在相邻噪声被添加到第一复合声音的状态下执行第一语音识别。 语音识别单元重新调整声音识别可靠性值低于固定标准的声音部分的语音参数值。 语音合成单元(204)通过改变的语音参数值和识别的语音输出第二复合音调。
-
公开(公告)号:KR100930039B1
公开(公告)日:2009-12-07
申请号:KR1020070133217
申请日:2007-12-18
Applicant: 한국전자통신연구원
CPC classification number: G10L15/01
Abstract: An apparatus for evaluating the performance of speech recognition includes a speech database for storing N-number of test speech signals for evaluation. A speech recognizer is located in an actual environment and executes the speech recognition of the test speech signals reproduced using a loud speaker from the speech database in the actual environment to produce speech recognition results. A performance evaluation module evaluates the performance of the speech recognition by comparing correct recognition results answers with the speech recognition results.
Abstract translation: 用于评估语音识别性能的装置包括用于存储N个用于评估的测试语音信号的语音数据库。 语音识别器位于实际环境中,并且在实际环境中从语音数据库执行使用扬声器再现的测试语音信号的语音识别以产生语音识别结果。 性能评估模块通过比较正确的识别结果答案和语音识别结果来评估语音识别的性能。
-
公开(公告)号:KR1020090065746A
公开(公告)日:2009-06-23
申请号:KR1020070133217
申请日:2007-12-18
Applicant: 한국전자통신연구원
CPC classification number: G10L15/01
Abstract: A device and a method for evaluating performance of a speech recognition engine are provided to require no interference of a person in any noise environment by adjusting an SNR(Signal-to-Noise Ratio) based on free volume control of a speech sound in a speaker. An evaluation speech database(201) stores evaluation speeches. An automatic voice recognition evaluator(203) plays the stored evaluation speech. The automatic voice recognition evaluator transmits an answer list and an audio signal file of evaluation data when voice recognition control for the evaluation data is completed. A speech recognizer(207) recognizes voice, and stores a voice recognition result list and a voice signal file used in voice recognition. A performance evaluation block(209) evaluates performance of a voice recognizer by comparing the answer list and the audio file with the voice recognition result list and the voice signal file.
Abstract translation: 提供了一种用于评估语音识别引擎的性能的装置和方法,用于通过基于扬声器中的语音的自由音量控制来调整SNR(信噪比)来不要求任何噪声环境中的人的干扰 。 评价语音数据库(201)存储评价语句。 自动语音识别评估器(203)播放存储的评估语音。 当评估数据的语音识别控制完成时,自动语音识别评估器发送评估数据的答案列表和音频信号文件。 语音识别器(207)识别语音,并存储语音识别结果列表和用于语音识别的语音信号文件。 性能评估块(209)通过将应答列表和音频文件与语音识别结果列表和语音信号文件进行比较来评估语音识别器的性能。
-
公开(公告)号:KR1020090061566A
公开(公告)日:2009-06-16
申请号:KR1020080088318
申请日:2008-09-08
Applicant: 한국전자통신연구원
IPC: G10L15/20
Abstract: A microphone array-based voice recognition system and a target voice extracting method in the system are provided to automatically find out one target voice uttered for voice recognition by using an HMM(Hidden Markov Model) and a GMM(Gaussian Mixture Model), thereby obtaining a higher recognition rate even in case of noise existence. A signal separator(110) separates mixed signals individually inputted through plural microphones into sound source signals through independent component analysis. A target voice extractor(120) extracts one target voice uttered for voice recognition among the separated sound source signals. A voice recognizer(130) recognizes a desired voice through the extracted target voice. An additional information unit transmits additional information used for the extraction of the target voice to the target voice extractor.
Abstract translation: 提供了一种基于麦克风阵列的语音识别系统和系统中的目标语音提取方法,通过使用HMM(隐马尔可夫模型)和GMM(高斯混合模型)自动找出发出语音识别的一个目标语音,从而获得 甚至在噪声存在的情况下也具有更高的识别率。 信号分离器(110)通过独立分量分析将通过多个麦克风分别输入的混合信号分离成声源信号。 目标语音提取器(120)在分离的声源信号之间提取用于语音识别发出的一个目标语音。 语音识别器(130)通过所提取的目标语音来识别期望的语音。 附加信息单元将用于提取目标语音的附加信息发送到目标语音提取器。
-
公开(公告)号:KR101578766B1
公开(公告)日:2015-12-22
申请号:KR1020110090283
申请日:2011-09-06
Applicant: 한국전자통신연구원
IPC: G10L15/08
Abstract: 본발명은선택적포즈가삽입될단어목록을기반으로요소 WFST를구성함으로써, 음성인식의성능을떨어뜨리지않으면서탐색공간의크기증가를최소화할수 있는음성인식용탐색공간생성장치및 방법에관한것이다.이를위하여본 발명은발음사전과, 선택적포즈가삽입될단어목록을저장하고있는단어목록데이터베이스와, 상기발음사전으로부터읽어들인각 단어의발음열을이용하여탐색공간을생성하되, 상기읽어드린단어가상기단어목록데이터베이스에포함된경우상기읽어드린단어에선택적포즈를삽입시켜탐색공간을생성하는탐색공간구현부와, 상기선택적포즈가삽입된탐색공간이저장된데이터베이스를포함하는음성인식용탐색공간생성장치를제공한다.
-
公开(公告)号:KR1020130057668A
公开(公告)日:2013-06-03
申请号:KR1020110123528
申请日:2011-11-24
Applicant: 한국전자통신연구원
CPC classification number: G10L15/20
Abstract: PURPOSE: A voice recognition apparatus based on a cepstrum feature vector and a method thereof are provided to estimate the reliability of each segment of an input voice signal including noise and to apply the reliability to a sound model and the input voice signal as a weighted value in a decoding step of voice recognition. CONSTITUTION: A reliability estimating unit(108) estimates the reliability of time-frequency segments from an input voice signal. A reliability reflecting unit(110) reflects estimated reliability to a normalized cepstrum feature vector extracted from the input voice signal and a cepstrum average vector included in decoding regarding the state of a HMM(Hidden Markov Model). A cepstrum transforming unit(112) transforms reliability reflected cepstrum feature and average vectors through a cosine transformation matrix. An output probability calculating unit(113) calculates an output probability value of the time-frequency segments. [Reference numerals] (101) Frame based dividing unit; (102) Filter bank analyzing unit; (104,111) Cosine transformation unit(DCT); (105) Cepstrum normalization unit; (106) HMM sound model; (107) HMM average vector; (108) Reliability estimating unit; (109) Cosine reverse-transformation unit(IDCT); (110) Reliability reflecting unit; (112) Cepstrum transforming unit; (113) Output probability calculating unit; (AA) Background noise sound signal input; (BB) Log filter bank energy
Abstract translation: 目的:提供一种基于倒谱特征向量的语音识别装置及其方法,以估计包括噪声的输入语音信号的每个片段的可靠性,并将可靠性应用于声音模型,并将输入的语音信号作为加权值 在语音识别的解码步骤中。 构成:可靠性估计单元(108)根据输入语音信号估计时频段的可靠性。 可靠性反射单元(110)将估计的可靠性反映到从输入语音信号提取的归一化反相特征向量和包括在关于HMM(隐马尔可夫模型)的状态的解码中包括的倒谱平均向量。 倒频变换单元(112)通过余弦变换矩阵来变换可靠性反射倒谱特征和平均矢量。 输出概率计算单元(113)计算时间段的输出概率值。 (附图标记)(101)基于帧的分割单元; (102)过滤器库分析单元; (104,111)余弦变换单元(DCT); (105)倒谱归一化单元; (106)HMM声音模型; (107)HMM平均向量; (108)可靠性估计单元; (109)余弦逆变换单元(IDCT); (110)可靠性反射单元; (112)倒谱变换单元; (113)输出概率计算单位; (AA)背景噪声声音信号输入; (BB)对数滤波器组能量
-
公开(公告)号:KR1020130026855A
公开(公告)日:2013-03-14
申请号:KR1020110090283
申请日:2011-09-06
Applicant: 한국전자통신연구원
IPC: G10L15/08
Abstract: PURPOSE: A search space generator for recognizing voice is provided to improve the accuracy of voice recognition by recognizing the voice by using a voice articulation database for training a voice model. CONSTITUTION: A search space generator for recognizing voice includes a pronunciation dictionary(100), a word list database(120), a WFST(Weighted Finite State Transducer) L realization unit(140), and a WFST L database(160). The WFST L implementation unit acquires a pronunciation string for each word by reading the pronunciation dictionary. The WFST L implementation unit generates WFST L in which a selective pause is inserted by comparing the acquired pronunciation dictionary with the word list stored in the word list database. [Reference numerals] (100) Pronunciation dictionary; (120) Word list database; (140) WFST L realization unit; (160) WFST L database
Abstract translation: 目的:提供用于识别语音的搜索空间发生器,以通过使用用于训练语音模型的语音发音数据库识别语音来提高语音识别的准确性。 构成:用于识别语音的搜索空间发生器包括发音字典(100),单词列表数据库(120),WFST(加权有限状态传感器)L实现单元(140)和WFST L数据库(160)。 WFST L实现单元通过读取发音字典获取每个单词的发音字符串。 WFST L实现单元产生WFST L,其中通过将获取的发音字典与存储在单词列表数据库中的单词列表进行比较来插入选择性暂停。 (附图标记)(100)发音字典; (120)词汇表数据库; (140)WFST L实现单元; (160)WFST L数据库
-
-
-
-
-
-
-
-
-