Abstract:
본 발명에 따른, 음성인식을 위한 특징 추출 장치는, 입력되는 음성신호를 소정 사이즈의 프레임 단위로 분리하는 프레임 형성부; 상기 음성신호의 각 프레임 별로 정적 특징 벡터를 추출하는 정적 특징 추출부; 기저함수 또는 기저벡터를 이용하여, 상기 추출된 정적 특징 벡터의 시간에 따른 변화를 나타내는 동적 특징 벡터를 추출하는 동적 특징 추출부; 및 상기 추출된 정적 특징 벡터와 동적 특징 벡터를 결합하여 특징 벡터 스트림을 구성하는 특징 벡터 결합부를 포함한다.
Abstract:
Disclosed are a device and a method for recognizing a voice using user location information. The device and the method of the present invention are capable of improving performance of a voice recognition service by using user location information to provide customized sound model and language model. The device for recognizing a voice using user location information according to the present invention comprises: a voice receiving unit to receive a user voice to be recognized; a location information identifying unit to identify user location information; a sound model extracting unit to analyze a noise circumstance of the place where a user is located by using the user location information, and extract a sound model corresponding to the noise circumstance from a sound model database; a vocabulary language model extracting unit to extract, from a vocabulary language model database, a vocabulary language model corresponding to the place where the user is located; and a voice recognizing unit to recognize the user voice by using the sound model and the vocabulary language model.
Abstract:
The present invention relates to an audio signal recognition method which strengthens the audio recognition function for ill-formed word model which may occur in dialogue when performing the dialogical continuous speech recognition. According to an embodiment of the present invention, the audio signal recognition method based on ill-formed word model comprises the following steps: matching an audio signal with a predetermined well-formed word model or ill-formed word model to determine if the inputted audio signal is an ill-formed word or well-formed word; and outputting the matching result of the inputted audio signal. According to the present invention, the present invention can improve the recognition performance since an engram linguistic model probability value can be maintained other than ill-formed parts such as concise style, repeated vocalization, and hesitation which are found in conversation of daily lives.
Abstract:
Disclosed are an acoustic model generation method and a device thereof. According to one embodiment of the present invention, the acoustic model generation method includes: a step of generating an acoustic model using pre-collected training acoustic data; a step of performing tree-based status clustering based on the generated acoustic model and the training acoustic data; a step of forming a status tree through the tree-based status clustering; and a step of generating a final acoustic model using the log acoustic data acquired from the voice of a user and the generated status tree. The step of performing the clustering performs the tree-based status clustering based on the statistical values of context-dependent phonemes acquired from the training acoustic data and on query sets acquired through a phonetic knowledge basis so that it is possible to generate an acoustic model optimized for the actual use environment and accordingly improve voice recognition performance.