-
公开(公告)号:KR1020140076816A
公开(公告)日:2014-06-23
申请号:KR1020120145284
申请日:2012-12-13
Applicant: 한국전자통신연구원
Abstract: The present invention relates to the improvements of performance of voice endpoints detection and features vector extractor used in a voice recognition system. According to the present invention, a method to detect an audio signal comprises the steps of detecting voice segments by performing the endpoints detection of a frame unit with respect to an input signal; extracting a feature value of the signal in at least a partial segment corresponding to a plurality of windows among the detected voice segments; and comparing the extracted feature value with a predetermined threshold to detect the actual voice segment of the voice segments. If a user uses the method provided in the present invention, the method can improve the performance of normalizing the feature vector used in the voice recognition system and perform to improve voice recognition performance in a noisy environment.
Abstract translation: 本发明涉及在语音识别系统中使用的语音端点检测和特征向量提取器的性能的改进。 根据本发明,检测音频信号的方法包括以下步骤:通过相对于输入信号执行帧单元的端点检测来检测语音段; 在检测到的语音片段中的与多个窗口对应的至少部分片段中提取所述信号的特征值; 以及将所提取的特征值与预定阈值进行比较,以检测语音段的实际语音段。 如果用户使用本发明提供的方法,则该方法可以提高在语音识别系统中使用的特征向量的归一化性能,并执行以改善噪声环境中的语音识别性能。
-
公开(公告)号:KR1020130068621A
公开(公告)日:2013-06-26
申请号:KR1020110135916
申请日:2011-12-15
Applicant: 한국전자통신연구원
IPC: G10L15/14
Abstract: PURPOSE: A utterance verification based mass voice data automatic processing device and a method thereof are provided to utilize a voice model in voice modeling data collection and error data verification by automatically classifying mass voice data through a voice recognition system and generating a voice model using the classified voice data. CONSTITUTION: An utterance verification unit(160) classifies each mass voice data into normally recognized data, abnormally recognized data and data using a feature of a voice extracted from an extraction unit(140), context-dependent adaptive model and context-independent adaptive anti-phoneme model. An acoustic modeling unit(180) classifies the mass voice data and generates an acoustic model based on the classified acoustic modeling data. [Reference numerals] (120) Saving unit; (140) Extraction unit; (160) Utterance verification unit; (180) Acoustic modeling unit
Abstract translation: 目的:提供一种基于话音验证的大规模语音数据自动处理设备及其方法,通过语音识别系统自动对大众语音数据进行分类,并利用语音模型生成语音模型,利用语音建模数据收集和错误数据验证中的语音模型 分类语音数据。 构成:话音验证单元(160)使用从提取单元(140)提取的语音的特征,上下文相关自适应模型和上下文无关自适应防御(140)将每个大众声音数据分类为正常识别的数据,异常识别的数据和数据 -phoneme模型。 声学建模单元(180)将质量声音数据分类并基于分类的声学建模数据生成声学模型。 (附图标记)(120)保存单元; (140)提取单位; (160)话语验证单位; (180)声学建模单元
-
公开(公告)号:KR1020130068598A
公开(公告)日:2013-06-26
申请号:KR1020110135888
申请日:2011-12-15
Applicant: 한국전자통신연구원
CPC classification number: G09B19/06 , G09B5/04 , G09B7/04 , G10L15/005 , G10L15/26
Abstract: PURPOSE: A pronunciation evaluation device and a method are provided to evaluate foreign language pronunciations using an acoustic model of a foreign language learner, pronunciations generated using a pronunciation model in which pronunciation errors are reflected, and an acoustic model of a native speaker, thereby increasing the accuracy of the pronunciation generated for the sound of the foreign language learner. CONSTITUTION: A pronunciation evaluation device(100) includes a sound input part(110), a sentence input part(120), a storage part(130), a pronunciation generation part(140), a pronunciation evaluation part(150), and an output part(160). The sound input part receives the sound of a foreign language learner, and the sentence input part receives a sentence corresponding to the sound of the foreign language learner. The storage part stores an acoustic model for the sound of the foreign language learner and a pronunciation dictionary for the sound of the foreign language learner. The pronunciation generation part performs sound recognition based on the acoustic model and pronunciation dictionary for the sound of the foreign language learner stored in the storage part. The pronunciation evaluation part detects the vocalization errors by analyzing the pronunciations for the sound of the foreign language learner. The output part outputs the vocalization errors of the foreign language learner detected from the pronunciation evaluation part. [Reference numerals] (110) Sound input part; (120) Sentence input part; (130) Storage part; (140) Pronunciation generation part; (150) Pronunciation evaluation part; (160) Output part
Abstract translation: 目的:提供一种发音评价装置和方法,以使用外语学习者的声学模型评估外语发音,使用其中反映发音错误的发音模型产生的发音和母语者的声学模型,从而增加 为外语学习者的声音产生的发音的准确性。 发音评价装置(100)包括声音输入部(110),句子输入部(120),存储部(130),发音生成部(140),发音评价部(150) 输出部分(160)。 声音输入部分接收外语学习者的声音,并且句子输入部分接收与外语学习者声音相对应的句子。 存储部分存储外语学习者的声音的声学模型和用于外语学习者的声音的发音词典。 发音生成部基于存储在存储部中的外语学习者的声音的声学模型和发音字典进行声音识别。 发音评价部分通过分析外语学习者的声音发音来检测发音错误。 输出部分输出从发音评价部分检测到的外语学习者的发声错误。 (附图标记)(110)声音输入部; (120)句子输入部分; (130)储存部分; (140)发音生成部分; (150)发音评价部分; (160)输出部分
-
公开(公告)号:KR1020120043552A
公开(公告)日:2012-05-04
申请号:KR1020100104894
申请日:2010-10-26
Applicant: 한국전자통신연구원
IPC: G10L15/14
Abstract: PURPOSE: A voice recognition apparatus and a method thereof are provided to increase recognition speed of an input signal and to perform recognition of an input signal in parallel. CONSTITUTION: A global database unit(10) includes a global feature vector(12), a global vocabulary model(14), and a global sound model(16). A recognition unit(20) includes separated recognition units(22a~22n). A plurality of separate recognition units performs voice recognition in parallel. A separate database unit(30) includes separate language models. A collection and evaluation unit(40) collects and evaluates the recognition result of the separate recognition unit.
Abstract translation: 目的:提供一种语音识别装置及其方法,以增加输入信号的识别速度并且并行执行输入信号的识别。 构成:全局数据库单元(10)包括全局特征向量(12),全球词汇模型(14)和全局声音模型(16)。 识别单元(20)包括分离的识别单元(22a〜22n)。 多个单独的识别单元并行执行语音识别。 单独的数据库单元(30)包括单独的语言模型。 收集和评估单元(40)收集并评估单独识别单元的识别结果。
-
公开(公告)号:KR1020120042090A
公开(公告)日:2012-05-03
申请号:KR1020100103581
申请日:2010-10-22
Applicant: 한국전자통신연구원
IPC: G10L15/14
Abstract: PURPOSE: A voice recognition system is provided to increase recognition performance of abnormal send and to reduce the recurrence of a user by recognition of abnormal send. CONSTITUTION: A determining unit(120) determines whether a speech of a user is segment speech. A first recognition unit(130) recognizes a voice of the user by using a phonemic probability model. A second recognition unit(140) recognizes the voice of the user according to a comparison result of a voice signal and a previously learned learning probability model.
Abstract translation: 目的:提供语音识别系统,提高异常发送的识别性能,通过识别异常发送减少用户的复发。 构成:确定单元(120)确定用户的语音是否是段语音。 第一识别单元(130)通过使用音素概率模型识别用户的语音。 第二识别单元根据语音信号和先前学习的学习概率模型的比较结果识别用户的语音。
-
56.
公开(公告)号:KR101072886B1
公开(公告)日:2011-10-17
申请号:KR1020080127707
申请日:2008-12-16
Applicant: 한국전자통신연구원
Abstract: 본발명은캡스트럼평균차감방법및 그장치에관한것으로, 온라인음성인식서비스에서묵음구간의캡스트럼평균값을사용하여실제음성구간전체의캡스트럼평균값을추정함으로써, 채널특성을보다정확하게정규화할수 있다. 또한, 본발명은주변환경변화가발생하는경우에대해서도정확한캡스트럼평균값을추정할수 있어채널정규화성능이우수하다. 또한, 본발명은온라인음성인식상황에서추정하는묵음구간의캡스트럼평균값과실제음성구간전체의캡스트럼평균값과의차이로인한음성인식성능저하를극복할수 있다.
-
公开(公告)号:KR1020110066622A
公开(公告)日:2011-06-17
申请号:KR1020090123354
申请日:2009-12-11
Applicant: 한국전자통신연구원
Abstract: PURPOSE: An international interpretation device and method thereof based on voice recognition are provided to supply text data or a synthetic voice which is interpreted in a native language to attendees. CONSTITUTION: A conference participant information registering unit(100) registers conference participant information including the language used by a conference participant. A voice recognition unit(200) registers a keyword according to the conference participant contents of presentations in advance. The voice recognition circuit outputs a voice recognition result of a keyword form. A language interpreting unit(300) performs conversion to a target language corresponding to a using language per conference participants.
Abstract translation: 目的:提供一种基于语音识别的国际解读装置及其方法,用于向参加者提供用母语解释的文本数据或合成语音。 组织:会议参与者信息登记单元(100)登记包括会议参与者使用的语言的会议参与者信息。 语音识别单元(200)预先根据演示的会议参与者内容登记关键字。 语音识别电路输出关键词形式的语音识别结果。 语言解释单元(300)根据每个会议参与者执行对应于使用语言的目标语言的转换。
-
公开(公告)号:KR1020110034360A
公开(公告)日:2011-04-05
申请号:KR1020090091867
申请日:2009-09-28
Applicant: 한국전자통신연구원
IPC: G10L21/0272 , G10L21/0208 , G10L15/20
Abstract: PURPOSE: A location tracing apparatus using user voice and method thereof are provided to reduce voice recognition degradation by echo sound and to integrate a sound source location determination technology. CONSTITUTION: A sound source determining unit(10) divides two channel signals by sound source. A stereo wiener filter unit(20) removes a noise from a sound source signal that is separated by the sound source determining unit. The stereo wiener filter unit filters a residual signal factor. A voice recognition unit(60) recognizes the voice of a user from the separated sound source signal and measures the reliability for a voice recognition result. A channel select unit(80) selects a target channel based on the reliability for the voice recognition result. A sound source location tracing unit(130) analyzes an interfere channel signal and a target channel signal.
Abstract translation: 目的:提供使用用户语音的位置跟踪装置及其方法,以减少由回声产生的声音识别劣化,并集成声源定位技术。 构成:音源决定单元(10)通过声源分割两声道信号。 立体声滤波器单元(20)从由声源确定单元分离的声源信号中去除噪声。 立体声维纳滤波器单元滤除残留信号因子。 语音识别单元(60)从分离的声源信号识别用户的声音,并测量语音识别结果的可靠性。 频道选择单元(80)基于语音识别结果的可靠性来选择目标频道。 声源位置跟踪单元(130)分析干扰信道信号和目标信道信号。
-
59.
公开(公告)号:KR101001618B1
公开(公告)日:2010-12-17
申请号:KR1020080085095
申请日:2008-08-29
Applicant: 한국전자통신연구원
IPC: G10L15/08 , G10L15/26 , H04N21/438
Abstract: 본 발명은 음성 인식을 위한 음성 인식 정보를 생성하고 이를 이용하여 음성 입력을 통한 방송 서비스를 제공하는 기법에 관한 것으로, 방송 데이터의 문자열 정보에 따라 사전 매칭을 수행하고, 사전 매칭이 수행된 문자열의 구간 경계를 분할하여 음성 인식 대상 문자열 데이터를 생성하며, 이를 약어 처리하여 정규화한 후에, 정규화된 음성 인식 대상 문자열 데이터를 발화 이형태 문자열 데이터로 조합 생성하여 저장함으로써, 방송 서비스를 제공하기 위한 음성 입력 시 사용자 발화에 효과적으로 대응하여 해당 방송 서비스를 효과적으로 제공할 수 있는 것이다.
IP TV(Internet Protocol Television), 음성 인식-
公开(公告)号:KR1020100073161A
公开(公告)日:2010-07-01
申请号:KR1020080131755
申请日:2008-12-22
Applicant: 한국전자통신연구원
CPC classification number: G10L15/187 , G10L15/10
Abstract: PURPOSE: An utterance verification method and device for isolated word an NBEST recognition result are provided to enable more reliable voice recognition by displaying the acceptance/refusal or decision failure of voice recognition by measuring inter-phoneme similarity through DTW(Dynamic Time Warping). CONSTITUTION: A pre-processing unit(104) performs feature extraction and end point detection for detecting voice section and noise processing section. An NBEST voice recognition unit(106) perform an NBEST voice recognition through a viterbi speech in consideration of a context-subordinate sound model(26). An NBEST speech verification unit(108) compares the result of an SVM(Support Vector Machine) with the similarity result to measure the similarity of voice recognition result. Therefore, the NBEST speech verification unit displays the acceptance, refusal and decision failure for the voice recognition result.
Abstract translation: 目的:提供一种用于隔离词的话语验证方法和设备,其包括NBEST识别结果,以通过通过DTW(动态时间扭曲)测量语音间相似度来显示语音识别的接受/拒绝或决策失败来实现更可靠的语音识别。 构成:预处理单元(104)执行用于检测语音段和噪声处理部分的特征提取和终点检测。 考虑到上下文从属声音模型(26),NBEST语音识别单元(106)通过维特比语音执行NBEST语音识别。 NBEST语音验证单元(108)将SVM(支持向量机)的结果与相似度结果进行比较,以测量语音识别结果的相似度。 因此,NBEST语音验证单元显示语音识别结果的接受,拒绝和决策失败。
-
-
-
-
-
-
-
-
-