-
公开(公告)号:KR1020100066916A
公开(公告)日:2010-06-18
申请号:KR1020080125433
申请日:2008-12-10
Applicant: 한국전자통신연구원
IPC: G10L99/00
Abstract: PURPOSE: A method for separating noise from an audio signal is provided to increase performance of sound source separation and increase convergence speed in a weighted learning stage, thereby increasing calculation efficiency. CONSTITUTION: A plurality of microphones records an audio signal that a user speaks and a noise signal. A beam former(20) performs a beam forming process and a blind processing separation procedure for the recorded audio signal and noise signal. The beam former spatially and statistically divides the audio signal and the noise signal. A sound source separator(30) separates the sound source signal and outputs the separated sound source signal.
Abstract translation: 目的:提供一种从音频信号中分离噪声的方法,以增加声源分离的性能,增加加权学习阶段的收敛速度,从而提高计算效率。 构成:多个麦克风记录用户说话的音频信号和噪声信号。 波束成形器(20)对记录的音频信号和噪声信号执行波束形成处理和盲目处理分离程序。 波束形成器在空间和统计学上划分音频信号和噪声信号。 声源分离器(30)分离声源信号并输出分离的声源信号。
-
52.
公开(公告)号:KR1020100026187A
公开(公告)日:2010-03-10
申请号:KR1020080085095
申请日:2008-08-29
Applicant: 한국전자통신연구원
IPC: G10L15/08 , G10L15/26 , H04N21/438
Abstract: PURPOSE: A voice recognition information generation device, a method thereof, and a broadcast service method thereof are provided to generate a database from allomorph character string, thereby offering a broadcast service according to voice recognition. CONSTITUTION: A voice recognition information generation device includes a prior matching unit(302), a section boundary partition unit(308), a normalization unit(310), and an allomorph generation unit(312). The prior matching unit performs prior matching according to character string information of broadcast data. The section boundary partition unit partitions the section boundary of a character string of which prior matching is performed in order to generate voice recognition target character string data. The normalization unit normalizes generated voice recognition target character string. The allomorph generation unit generates allomorph character string data from normalized voice recognition target character string data.
Abstract translation: 目的:提供一种语音识别信息生成装置,其方法和广播服务方法,以从变形字符串生成数据库,从而根据语音识别提供广播服务。 构成:语音识别信息生成装置包括先验匹配单元(302),区间边界分割单元(308),归一化单元(310)和变形函数生成单元(312)。 先前的匹配单元根据广播数据的字符串信息执行先前的匹配。 区段边界分割单元对执行了先前匹配的字符串的区域边界进行分割,以便生成语音识别目标字符串数据。 归一化单元对生成的语音识别目标字符串进行归一化。 变形生成单元从归一化的语音识别目标字符串数据生成变形字符串数据。
-
公开(公告)号:KR100930039B1
公开(公告)日:2009-12-07
申请号:KR1020070133217
申请日:2007-12-18
Applicant: 한국전자통신연구원
CPC classification number: G10L15/01
Abstract: An apparatus for evaluating the performance of speech recognition includes a speech database for storing N-number of test speech signals for evaluation. A speech recognizer is located in an actual environment and executes the speech recognition of the test speech signals reproduced using a loud speaker from the speech database in the actual environment to produce speech recognition results. A performance evaluation module evaluates the performance of the speech recognition by comparing correct recognition results answers with the speech recognition results.
Abstract translation: 用于评估语音识别性能的装置包括用于存储N个用于评估的测试语音信号的语音数据库。 语音识别器位于实际环境中,并且在实际环境中从语音数据库执行使用扬声器再现的测试语音信号的语音识别以产生语音识别结果。 性能评估模块通过比较正确的识别结果答案和语音识别结果来评估语音识别的性能。
-
54.
公开(公告)号:KR100911429B1
公开(公告)日:2009-08-11
申请号:KR1020070084301
申请日:2007-08-22
Applicant: 한국전자통신연구원
IPC: G10L21/0208 , G10L15/20 , G10L15/06
CPC classification number: G10L15/20
Abstract: 본 발명은 잡음 적응형 변별 학습 방법을 포함하는 잡음 적응형 음향 모델 생성 방법 및 장치에 관한 것으로서, 다양한 환경 잡음을 포함하는 대규모 음성 학습 데이터로부터 기본 음향 모델 파라미터를 생성하는 단계 및 상기 생성된 기본 음향 모델 파라미터를 입력받아 변별 학습 기법을 적용하여 실제 정용 환경에 적합한 적응형 음향 모델 파라미터를 생성하는 단계를 포함하는 잡음 적응형 음향 모델 생성 방법을 제공할 수 있다.
음향 모델, 환경 잡음-
公开(公告)号:KR1020090065746A
公开(公告)日:2009-06-23
申请号:KR1020070133217
申请日:2007-12-18
Applicant: 한국전자통신연구원
CPC classification number: G10L15/01
Abstract: A device and a method for evaluating performance of a speech recognition engine are provided to require no interference of a person in any noise environment by adjusting an SNR(Signal-to-Noise Ratio) based on free volume control of a speech sound in a speaker. An evaluation speech database(201) stores evaluation speeches. An automatic voice recognition evaluator(203) plays the stored evaluation speech. The automatic voice recognition evaluator transmits an answer list and an audio signal file of evaluation data when voice recognition control for the evaluation data is completed. A speech recognizer(207) recognizes voice, and stores a voice recognition result list and a voice signal file used in voice recognition. A performance evaluation block(209) evaluates performance of a voice recognizer by comparing the answer list and the audio file with the voice recognition result list and the voice signal file.
Abstract translation: 提供了一种用于评估语音识别引擎的性能的装置和方法,用于通过基于扬声器中的语音的自由音量控制来调整SNR(信噪比)来不要求任何噪声环境中的人的干扰 。 评价语音数据库(201)存储评价语句。 自动语音识别评估器(203)播放存储的评估语音。 当评估数据的语音识别控制完成时,自动语音识别评估器发送评估数据的答案列表和音频信号文件。 语音识别器(207)识别语音,并存储语音识别结果列表和用于语音识别的语音信号文件。 性能评估块(209)通过将应答列表和音频文件与语音识别结果列表和语音信号文件进行比较来评估语音识别器的性能。
-
公开(公告)号:KR1020090061566A
公开(公告)日:2009-06-16
申请号:KR1020080088318
申请日:2008-09-08
Applicant: 한국전자통신연구원
IPC: G10L15/20
Abstract: A microphone array-based voice recognition system and a target voice extracting method in the system are provided to automatically find out one target voice uttered for voice recognition by using an HMM(Hidden Markov Model) and a GMM(Gaussian Mixture Model), thereby obtaining a higher recognition rate even in case of noise existence. A signal separator(110) separates mixed signals individually inputted through plural microphones into sound source signals through independent component analysis. A target voice extractor(120) extracts one target voice uttered for voice recognition among the separated sound source signals. A voice recognizer(130) recognizes a desired voice through the extracted target voice. An additional information unit transmits additional information used for the extraction of the target voice to the target voice extractor.
Abstract translation: 提供了一种基于麦克风阵列的语音识别系统和系统中的目标语音提取方法,通过使用HMM(隐马尔可夫模型)和GMM(高斯混合模型)自动找出发出语音识别的一个目标语音,从而获得 甚至在噪声存在的情况下也具有更高的识别率。 信号分离器(110)通过独立分量分析将通过多个麦克风分别输入的混合信号分离成声源信号。 目标语音提取器(120)在分离的声源信号之间提取用于语音识别发出的一个目标语音。 语音识别器(130)通过所提取的目标语音来识别期望的语音。 附加信息单元将用于提取目标语音的附加信息发送到目标语音提取器。
-
公开(公告)号:KR1020080052803A
公开(公告)日:2008-06-12
申请号:KR1020060124450
申请日:2006-12-08
Applicant: 한국전자통신연구원
Inventor: 정호영
IPC: G10L15/20 , G10L21/0272 , G10L21/0216
Abstract: A method for estimating clean voice by using noise model is provided to improve capability of processing background noise to recognize the voice. A method for estimating clean voice by using noise model includes receiving noise voice; extracting a noise interval in the noise voice which is received(403); identifying the noise interval which is extracted and the noise corresponding to a voice model which is stored previously(405); and estimating the clean voice corresponding to the noise which is identified and the voice model which is stored previously(409). When the noise interval is extracted from the voice, an initial average value and an initial distribution value of the noise, and a linear coefficient of the noise voice in the noise interval are calculated. The noise model and the voice model are a linear dynamic model based on an GMM(Gaussian Mixture Model).
Abstract translation: 提供了一种使用噪声模型估计干净声音的方法,以提高处理背景噪声识别语音的能力。 使用噪声模型估计干净声音的方法包括接收噪声声音; 提取接收到的噪声声音中的噪声间隔(403); 识别提取的噪声间隔和对应于先前存储的语音模型的噪声(405); 以及估计与所识别的噪声相对应的干净声音和先前存储的声音模型(409)。 当从语音中提取噪声间隔时,计算噪声的噪声的初始平均值和初始分布值以及噪声声音的线性系数。 噪声模型和语音模型是基于GMM(高斯混合模型)的线性动力学模型。
-
公开(公告)号:KR100614932B1
公开(公告)日:2006-08-25
申请号:KR1020050037094
申请日:2005-05-03
Applicant: 한국전자통신연구원
Inventor: 정호영
Abstract: 본 발명은 음성인식의 실제 적용에 있어 성능에 영향을 주는 채널변이를 해결하고자 하는 것이다. 이러한 본 발명의 장치는 멜(mel)-주파수 켑스트럼 계수(MFCC) 특징을 추출하여 시간에 따른 프레임열을 출력하는 특징추출부; 상기 출력된 멜-주파수 켑스트럼 계수(MFCC) 특징열의 평균값을 계산하는 특징 파라미터 평균계산부; 채널변이를 줄인 음성 데이터베이스로부터 코드북을 구성한 후, 채널왜곡된 입력음성의 멜-주파수 켑스트럼 계수(MFCC)값이 들어오면 각 프레임의 멜-주파수 켑스트럼 계수(MFCC)값과 상기 코드북 중심값과의 거리를 구해 프레임별 채널변이를 추정하는 프레임별 채널변이 추정부; 및 상기 특징 파라미터 평균계산부에서 얻어진 채널변이와 상기 프레임별 채널변이 추정부에서 얻어진 시간별 채널변이의 평균값을 스무싱(smoothing)한 후 스무싱된 평균값을 각 프레임의 멜-주파수 켑스트럼 계수(MFCC)에서 빼어 채널정규화된 멜-주파수 켑스트럼 계수(MFCC) 특징열을 출력하는 스무싱 기반 채널정규화부로 구성된다. 따라서 본 발명은 음성인식시스템의 안정적인 성능을 위해서 채널정규화 방법을 제시하고 있으며, 다양한 채널변이가 있는 환경, 특히 전화망 환경에서 인식성능 향상에 기여할 수 있다.
음성인식, 채널정규화, MFCC, 평균, 채널변이 추정Abstract translation: 本发明试图解决在语音识别的实际应用中影响性能的信道变化。 该设备包括:特征提取单元,用于提取美尔频率倒谱系数(MFCC)特征并根据时间输出帧序列; 特征参数平均值计算单元,用于计算梅尔倒频谱系数(MFCC)的输出特征值的平均值; 从语音数据库配置所述码本之后,减少信道变化,所述失真的输入语音的信道梅尔进入频率倒谱系数(MFCC)值每帧开普频率弹拨系数(MFCC)值以及码本中心搪瓷 信道变化估计单元,用于估计每个帧的信道变化, 然后对每一帧平滑由帧变化估计部分获得的特征参数平均计算部分和信道变化估计部分所获得的信道变化的平均值,然后通过使用Mel- 以及基于平滑的信道归一化单元,用于从MFCC中减去并输出信道归一化的梅尔频率倒谱系数(MFCC)特征行。 因此,本发明提出了一种用于语音识别系统的稳定性能的信道归一化方法,并且可以有助于改善具有各种信道变化的环境中的识别性能,特别是在电话网络环境中。
-
-
-
-
-
-
-
-