WFST를 이용한 음성 끝점 검출 장치 및 방법
    21.
    发明公开
    WFST를 이용한 음성 끝점 검출 장치 및 방법 审中-实审
    使用加权有限状态传感器检测语音端点的方法和装置

    公开(公告)号:KR1020140147587A

    公开(公告)日:2014-12-30

    申请号:KR1020130071143

    申请日:2013-06-20

    CPC classification number: G10L15/05 G10L25/27 G10L25/87

    Abstract: A device and a method for detecting a speech endpoint using a weighted finite state transducer (WFST) are provided. The present invention includes: a speech decision unit for receiving a frame unit characteristic vector which is converted from a speech signal, and analyzing the received characteristic vector and classifying the vector by speech class and noise class; a frame level WFST for receiving the classified speech class and noise class and converting the classes into WFST type data; a speech level WFST for analyzing the relation among the classified speech class, noise class and predetermined status, and detecting a speech endpoint; a WFST combination unit for combining the frame level WFST and speech level WFST; and an optimization unit for optimizing the combined WFST, which the frame level WFST and speech level WFST are combined, to be the minimum.

    Abstract translation: 提供了一种使用加权有限状态换能器(WFST)来检测语音端点的装置和方法。 本发明包括:语音判定单元,用于接收从语音信号转换的帧单位特征向量,并分析接收到的特征向量,并通过语音类别和噪声等级对向量进行分类; 用于接收分类的语音类别和噪声等级并将这些类别转换成WFST类型数据的帧级WFST; 用于分析分类语音类别,噪声等级和预定状态之间的关系以及检测语音端点的语音电平WFST; 用于组合帧级WFST和语音级WFST的WFST组合单元; 以及用于优化帧级WFST和语音级WFST组合的组合WFST的最优化单元。

    음성인식 성능향상 방법
    22.
    发明公开
    음성인식 성능향상 방법 审中-实审
    语音识别性能改进方法

    公开(公告)号:KR1020140077422A

    公开(公告)日:2014-06-24

    申请号:KR1020120146227

    申请日:2012-12-14

    CPC classification number: G10L15/14 G10L19/038 G10L21/02 G10L2015/025

    Abstract: A method for improving a voice recognition performance according to an embodiment of the present invention is provided to improve a voice recognition performance for a voice inputted under noise circumstances based on at least a single voice recognition feature vector. The method for improving a voice recognition performance according to an embodiment of the present invention includes the steps of: extracting at least two or more feature vectors according to at least two or more sound models which are set by each phoneme for an inputted Korean voice; extracting an observation probability value by each phoneme through the previous feature vectors of at least two or more sound models and at least two or more feature vectors which are preset for an integrated sound model activated in a viterbi decoder; and resetting the integrated sound model based on the extracted observation probability value by each phoneme, and re-recognizing the Korean voice by each phoneme.

    Abstract translation: 提供了根据本发明的实施例的用于提高语音识别性能的方法,用于基于至少单个语音识别特征向量来改善在噪声环境下输入的语音的语音识别性能。 根据本发明实施例的用于提高语音识别性能的方法包括以下步骤:根据由输入的韩语语音的每个音素设置的至少两个或多个声音模型提取至少两个或更多个特征向量; 通过至少两个或多个声音模型的先前特征向量和至少两个或更多个为维特比解码器中激活的综合声音模型预设的特征向量提取每个音素的观测概率值; 并根据每个音素提取的观察概率值重新设置综合声音模型,并通过每个音素重新识别韩语声音。

    음성 신호의 검출 방법 및 장치
    23.
    发明公开
    음성 신호의 검출 방법 및 장치 有权
    一种检测音频信号的方法及其设备

    公开(公告)号:KR1020140076816A

    公开(公告)日:2014-06-23

    申请号:KR1020120145284

    申请日:2012-12-13

    Inventor: 박기영 이윤근

    CPC classification number: G10L15/02 G10L15/05 G10L15/20

    Abstract: The present invention relates to the improvements of performance of voice endpoints detection and features vector extractor used in a voice recognition system. According to the present invention, a method to detect an audio signal comprises the steps of detecting voice segments by performing the endpoints detection of a frame unit with respect to an input signal; extracting a feature value of the signal in at least a partial segment corresponding to a plurality of windows among the detected voice segments; and comparing the extracted feature value with a predetermined threshold to detect the actual voice segment of the voice segments. If a user uses the method provided in the present invention, the method can improve the performance of normalizing the feature vector used in the voice recognition system and perform to improve voice recognition performance in a noisy environment.

    Abstract translation: 本发明涉及在语音识别系统中使用的语音端点检测和特征向量提取器的性能的改进。 根据本发明,检测音频信号的方法包括以下步骤:通过相对于输入信号执行帧单元的端点检测来检测语音段; 在检测到的语音片段中的与多个窗口对应的至少部分片段中提取所述信号的特征值; 以及将所提取的特征值与预定阈值进行比较,以检测语音段的实际语音段。 如果用户使用本发明提供的方法,则该方法可以提高在语音识别系统中使用的特征向量的归一化性能,并执行以改善噪声环境中的语音识别性能。

    인트라 프레임 특성을 이용한 자동음성인식 성능 향상 방법
    24.
    发明公开
    인트라 프레임 특성을 이용한 자동음성인식 성능 향상 방법 审中-实审
    语音识别性能改进使用内部帧功能

    公开(公告)号:KR1020140059601A

    公开(公告)日:2014-05-16

    申请号:KR1020120126211

    申请日:2012-11-08

    CPC classification number: G10L15/02 G10L21/0272 G10L21/038

    Abstract: Disclosed is a method for improving automatic voice recognition performance using an intra frame feature. According to the present invention, the method for improving automatic voice recognition performance using an intra frame feature includes: a step of collecting speech signals and preprocessing the collected speech signals by boosting or attenuating the signals; a step of dividing the preprocessed speech signals by threshold band using a gamma-tone filter bank and channelizing signals in each threshold band; a step of frame-blocking the channelized speech signals with a frame shift size of 10 ms and a frame size of 20 - 25 ms; a step of hamming-windowing each blocked channel and extracting a predefined amount of data from the predefined section; a step of estimating signal intensity from the extracted data based on time-frequency and estimating energy based on the estimated signal intensity; a step of getting Cepstral coefficients and derivatives through logarithmic operation and discrete cosine transform for the estimated energy; a step of performing sub-frame analysis for the preprocessed speech signals and extracting intra frame features from the sub-frame analyzed speech signals; and a step of getting voice recognition features by combining the Cepstral coefficients, the derivatives, and the intra frame features.

    Abstract translation: 公开了一种使用帧内特征来提高自动语音识别性能的方法。 根据本发明,使用帧内特征提高自动语音识别性能的方法包括:通过增强或衰减信号来收集语音信号和预处理所收集的语音信号的步骤; 使用伽马色调滤波器组将预处理的语音信号除以阈值频带并在每个阈值频带中信道化信号的步骤; 以10ms的帧移位大小和20-25ms的帧大小帧阻塞信道化语音信号的步骤; 对每个被阻塞的信道进行汉明开窗,并从预定义的部分提取预定量的数据; 基于时间频率从所提取的数据估计信号强度的步骤,并且基于所估计的信号强度估计能量; 通过对数运算和离散余弦变换获得倒频谱系数和导数的估计能量的步骤; 对所述预处理语音信号进行子帧分析并从所述子帧分析的语音信号中提取帧内特征的步骤; 以及通过组合倒频谱系数,导数和帧内特征来获得语音识别特征的步骤。

    발화검증 기반 대용량 음성 데이터 자동 처리 장치 및 방법
    25.
    发明公开
    발화검증 기반 대용량 음성 데이터 자동 처리 장치 및 방법 有权
    基于UTTERANCE验证自动处理大量语音数据的装置和方法

    公开(公告)号:KR1020130068621A

    公开(公告)日:2013-06-26

    申请号:KR1020110135916

    申请日:2011-12-15

    CPC classification number: G10L15/14 G10L15/02 G10L15/05

    Abstract: PURPOSE: A utterance verification based mass voice data automatic processing device and a method thereof are provided to utilize a voice model in voice modeling data collection and error data verification by automatically classifying mass voice data through a voice recognition system and generating a voice model using the classified voice data. CONSTITUTION: An utterance verification unit(160) classifies each mass voice data into normally recognized data, abnormally recognized data and data using a feature of a voice extracted from an extraction unit(140), context-dependent adaptive model and context-independent adaptive anti-phoneme model. An acoustic modeling unit(180) classifies the mass voice data and generates an acoustic model based on the classified acoustic modeling data. [Reference numerals] (120) Saving unit; (140) Extraction unit; (160) Utterance verification unit; (180) Acoustic modeling unit

    Abstract translation: 目的:提供一种基于话音验证的大规模语音数据自动处理设备及其方法,通过语音识别系统自动对大众语音数据进行分类,并利用语音模型生成语音模型,利用语音建模数据收集和错误数据验证中的语音模型 分类语音数据。 构成:话音验证单元(160)使用从提取单元(140)提取的语音的特征,上下文相关自适应模型和上下文无关自适应防御(140)将每个大众声音数据分类为正常识别的数据,异常识别的数据和数据 -phoneme模型。 声学建模单元(180)将质量声音数据分类并基于分类的声学建模数据生成声学模型。 (附图标记)(120)保存单元; (140)提取单位; (160)话语验证单位; (180)声学建模单元

    외국어 학습자의 발음 평가 장치 및 방법
    26.
    发明公开
    외국어 학습자의 발음 평가 장치 및 방법 无效
    用于评估外国语言学习者授权的装置和方法

    公开(公告)号:KR1020130068598A

    公开(公告)日:2013-06-26

    申请号:KR1020110135888

    申请日:2011-12-15

    CPC classification number: G09B19/06 G09B5/04 G09B7/04 G10L15/005 G10L15/26

    Abstract: PURPOSE: A pronunciation evaluation device and a method are provided to evaluate foreign language pronunciations using an acoustic model of a foreign language learner, pronunciations generated using a pronunciation model in which pronunciation errors are reflected, and an acoustic model of a native speaker, thereby increasing the accuracy of the pronunciation generated for the sound of the foreign language learner. CONSTITUTION: A pronunciation evaluation device(100) includes a sound input part(110), a sentence input part(120), a storage part(130), a pronunciation generation part(140), a pronunciation evaluation part(150), and an output part(160). The sound input part receives the sound of a foreign language learner, and the sentence input part receives a sentence corresponding to the sound of the foreign language learner. The storage part stores an acoustic model for the sound of the foreign language learner and a pronunciation dictionary for the sound of the foreign language learner. The pronunciation generation part performs sound recognition based on the acoustic model and pronunciation dictionary for the sound of the foreign language learner stored in the storage part. The pronunciation evaluation part detects the vocalization errors by analyzing the pronunciations for the sound of the foreign language learner. The output part outputs the vocalization errors of the foreign language learner detected from the pronunciation evaluation part. [Reference numerals] (110) Sound input part; (120) Sentence input part; (130) Storage part; (140) Pronunciation generation part; (150) Pronunciation evaluation part; (160) Output part

    Abstract translation: 目的:提供一种发音评价装置和方法,以使用外语学习者的声学模型评估外语发音,使用其中反映发音错误的发音模型产生的发音和母语者的声学模型,从而增加 为外语学习者的声音产生的发音的准确性。 发音评价装置(100)包括声音输入部(110),句子输入部(120),存储部(130),发音生成部(140),发音评价部(150) 输出部分(160)。 声音输入部分接收外语学习者的声音,并且句子输入部分接收与外语学习者声音相对应的句子。 存储部分存储外语学习者的声音的声学模型和用于外语学习者的声音的发音词典。 发音生成部基于存储在存储部中的外语学习者的声音的声学模型和发音字典进行声音识别。 发音评价部分通过分析外语学习者的声音发音来检测发音错误。 输出部分输出从发音评价部分检测到的外语学习者的发声错误。 (附图标记)(110)声音输入部; (120)句子输入部分; (130)储存部分; (140)发音生成部分; (150)发音评价部分; (160)输出部分

    코퍼스 기반 언어모델 변별학습 방법 및 그 장치
    27.
    发明公开
    코퍼스 기반 언어모델 변별학습 방법 및 그 장치 无效
    基于公司语言模式辨别培训的装置和方法

    公开(公告)号:KR1020130067854A

    公开(公告)日:2013-06-25

    申请号:KR1020110134848

    申请日:2011-12-14

    CPC classification number: G06F17/277 G06F17/18

    Abstract: PURPOSE: A Corpus-based language model discrimination learning method and a device thereof are provided to easily build and use a learning database corresponding to a target domain by building a discrimination learning training corpus database with a text corpus. CONSTITUTION: A language model discrimination learning database extracts a voice feature vector from a corpus database to be built(S302). Continuous speech voice recognition is performed by receiving the voice feature vector(S303). The language model discrimination learning is performed by using a score sentence score and a voice recognition result outputted through continuous speech voice recognition performance(S304). A discrimination language model is generated(S305). [Reference numerals] (AA) Start; (BB) End; (S301) Build a DB for language model discrimination learning; (S302) Extract a voice feature vector; (S303) Recognize voice of continuous speech; (S304) Perform the language model discrimination learning; (S305) Generate a discriminative language model

    Abstract translation: 目的:提供一种基于语料库的语言模型识别学习方法及其设备,通过建立具有文本语料库的歧视学习训练语料库数据库,轻松构建和使用与目标域对应的学习数据库。 构成:语言模型识别学习数据库从要构建的语料库数据库中提取语音特征向量(S302)。 通过接收语音特征向量来执行连续语音识别(S303)。 通过使用通过连续语音识别性能输出的分数句分数和语音识别结果来执行语言模型识别学习(S304)。 生成辨别语言模型(S305)。 (附图标记)(AA)开始; (BB)结束; (S301)建立语言模型歧视学习DB; (S302)提取语音特征向量; (S303)识别连续语音的声音; (S304)执行语言模型识别学习; (S305)生成歧视语言模型

    한국어 연속 음성인식을 위한 컨퓨젼 네트워크 리스코어링 장치 및 이를 이용한 컨퓨젼 네트워크 생성 방법 및 리스코어링 방법
    28.
    发明公开
    한국어 연속 음성인식을 위한 컨퓨젼 네트워크 리스코어링 장치 및 이를 이용한 컨퓨젼 네트워크 생성 방법 및 리스코어링 방법 有权
    为了连续地使用韩国语音识别的混合网络的装置以及使用该方法生成和减少混合网络的方法

    公开(公告)号:KR1020130011574A

    公开(公告)日:2013-01-30

    申请号:KR1020110072813

    申请日:2011-07-22

    Abstract: PURPOSE: A confusion network rescoring device for Korean continuous voice recognition, a method for generating a confusion network by using the same, and a rescoring method thereof are provided to improve a generation speed of the confusion network by setting a limit of a lattice link probability in a process for converting a lattice structure into a confusion network structure. CONSTITUTION: A confusion network rescoring device receives on or more lattices generated through voice recognition(S105). The device calculates each posterior probability of the lattices(S110). The device allocates a node included in the lattices to plural equivalence classes based on the posterior probability(S120,S130,S135). The device generates a confusion set by using the equivalence classes(S150,S155). The device generates a confusion network based on the confusion set. [Reference numerals] (AA) Start; (BB,DD,FF,HH,JJ) No; (CC,EE,GG,II,KK) Yes; (LL) End; (S105) Inputting lattices through voice recognition; (S110) Calculating each posterior probability of the lattices; (S115) Inputting SLF?; (S120) Allocating a first node(no) of the lattices to a first equivalence class(NO); (S125) N_i and n_i links exist?; (S130) Allocating an i-th node(n_i) of the lattices to a j-th equivalence class(N_j); (S135) Allocating the i-th node(n_i) of the lattices to a i-th equivalence class(N_i); (S140) Allocating all nodes of the lattices?; (S145) If u∈N_s n_i∈N_t, t=s+1 in e(u->n_i); (S150) Classifying the e(u->n_i) as CS(N_s,N_t); (S155) Classifying the e(u->n_i) as CS(N_k,N_k+1); (S160) Normalizing link probability in an extracted CS sequence; (S165) Adding a Null link, and allocating remaining probability values of a normalized value; (S170) Possibility value of the Null link > possibility value of the other link; (S175) Excluding the CS sequence from a voice recognition result

    Abstract translation: 目的:提供一种用于韩语连续语音识别的混淆网络解密设备,通过使用该方法产生混淆网络的方法及其解决方法,以通过设置网格链路概率的限制来提高混淆网络的生成速度 在将网格结构转换成混淆网络结构的过程中。 构成:混淆网络重新获取装置接收通过语音识别产生的或多个格子(S105)。 该装置计算格子的每个后验概率(S110)。 该设备基于后验概率将包括在格子中的节点分配给多个等价类(S120,S130,S135)。 该设备通过使用等价类产生混淆集(S150,S155)。 该设备基于混淆集产生混淆网络。 (附图标记)(AA)开始; (BB,DD,FF,HH,JJ)否; (CC,EE,GG,II,KK)是; (LL)结束; (S105)通过语音识别输入格子; (S110)计算格子的每个后验概率; (S115)输入SLF? (S120)将格子的第一节点(否)分配给第一等价类(NO); (S125)存在N_i和n_i个链路? (S130)将格子的第i个节点(n_i)分配给第j个等价类(N_j); (S135)将格子的第i个节点(n_i)分配给第i个等价类(N_i); (S140)分配格子的所有节点? (S145)如果u∈N_sn_i∈N_t,则e(u-> n_i)中的t = s + 1; (S150)将e(u-> n_i)分类为CS(N_s,N_t); (S155)将e(u-> n_i)分为CS(N_k,N_k + 1); (S160)在提取的CS序列中归一化链路概率; (S165)添加空链路,分配归一化值的剩余概率值; (S170)空链路的可能值>其他链路的可能值; (S175)从语音识别结果中排除CS序列

    음성 인식 장치 및 방법
    29.
    发明公开
    음성 인식 장치 및 방법 有权
    语音识别装置及其方法

    公开(公告)号:KR1020120043552A

    公开(公告)日:2012-05-04

    申请号:KR1020100104894

    申请日:2010-10-26

    Abstract: PURPOSE: A voice recognition apparatus and a method thereof are provided to increase recognition speed of an input signal and to perform recognition of an input signal in parallel. CONSTITUTION: A global database unit(10) includes a global feature vector(12), a global vocabulary model(14), and a global sound model(16). A recognition unit(20) includes separated recognition units(22a~22n). A plurality of separate recognition units performs voice recognition in parallel. A separate database unit(30) includes separate language models. A collection and evaluation unit(40) collects and evaluates the recognition result of the separate recognition unit.

    Abstract translation: 目的:提供一种语音识别装置及其方法,以增加输入信号的识别速度并且并行执行输入信号的识别。 构成:全局数据库单元(10)包括全局特征向量(12),全球词汇模型(14)和全局声音模型(16)。 识别单元(20)包括分离的识别单元(22a〜22n)。 多个单独的识别单元并行执行语音识别。 单独的数据库单元(30)包括单独的语言模型。 收集和评估单元(40)收集并评估单独识别单元的识别结果。

    음성 인식 시스템
    30.
    发明公开
    음성 인식 시스템 有权
    语音识别系统

    公开(公告)号:KR1020120042090A

    公开(公告)日:2012-05-03

    申请号:KR1020100103581

    申请日:2010-10-22

    Abstract: PURPOSE: A voice recognition system is provided to increase recognition performance of abnormal send and to reduce the recurrence of a user by recognition of abnormal send. CONSTITUTION: A determining unit(120) determines whether a speech of a user is segment speech. A first recognition unit(130) recognizes a voice of the user by using a phonemic probability model. A second recognition unit(140) recognizes the voice of the user according to a comparison result of a voice signal and a previously learned learning probability model.

    Abstract translation: 目的:提供语音识别系统,提高异常发送的识别性能,通过识别异常发送减少用户的复发。 构成:确定单元(120)确定用户的语音是否是段语音。 第一识别单元(130)通过使用音素概率模型识别用户的语音。 第二识别单元根据语音信号和先前学习的学习概率模型的比较结果识别用户的语音。

Patent Agency Ranking