Abstract:
PURPOSE: A method for converting time shaft of a voice signal is provided to apply three level center clipping and a level crossing method to a synthesis voice signal and an analysis voice signal and then perform synchronization, thereby capable of reducing an amount of calculation by canceling a normalization portion and reducing a search period. CONSTITUTION: An analysis voice frame is initialized as a synthesis voice frame(S1). Thereafter, when all voice data are inputted(S2), a time shaft conversion method is finished. When all voice data are not inputted, a clipping level of the synthesis voice frame and the analysis voice frame is determined(S3). The synthesis voice frame and the analysis voice frame are divided into three levels by using the determined clipping level(S4). A level crossing point of the synthesis voice frame and the analysis voice frame is searched(S5). A synchronization point between the synthesis voice frame and the analysis voice frame is searched by using the analysis voice signal, the synthesis voice signal and the level crossing point processed through a three level center clipping process(S6). On the basis of the searched synchronization point, the synthesis voice signal and the analysis voice signal are rearranged. Two signals are superposed and added(S7).
Abstract:
A speech detection apparatus using basis functions, which are trained by independent component analysis (ICA) and method thereof are provided. The speech detection method includes the steps of training basis functions of speech signals and basis functions of noise signals according to a predetermined learning rule, adapting the basis functions of noise signals to the present environment by using the characteristic of noise signals, which are input into a mike, extracting determination information for detection speech activation from the basis functions of speech signals and the basis functions of noise signals, and detecting a speech starting point and a speech ending point of mike signals, which are come into a speech recognition unit, from the determination information.
Abstract:
PURPOSE: A tree search based method for recognizing a voice and a high capacity voice recognition system for continuously recognizing voices using the same are provided to improve recognition rate by excepting words having low probability from a search target. CONSTITUTION: A voice signal inputted through a voice input part(100) is inputted to a feature extracting part(200), so that the feature extracting part extracts feature parameters and provides the feature parameters to a voice recognizing part(300). The voice recognizing part decides the corresponding word by assigning the input features into a sound model and a language model. The firstly inputted features are applied to a tree-based searching part(320) through a K delay(310). The tree-based searching part searches for a word line coinciding with an input voice to be recognized by using the sound model and the language model. A language model look-ahead processing part(340) reads a learned language model for calculating expectations by routes representing probability of succeeding the preceding word and removes routes having low expectations.
Abstract:
본 발명에 따른, 음성인식을 위한 특징 추출 장치는, 입력되는 음성신호를 소정 사이즈의 프레임 단위로 분리하는 프레임 형성부; 상기 음성신호의 각 프레임 별로 정적 특징 벡터를 추출하는 정적 특징 추출부; 기저함수 또는 기저벡터를 이용하여, 상기 추출된 정적 특징 벡터의 시간에 따른 변화를 나타내는 동적 특징 벡터를 추출하는 동적 특징 추출부; 및 상기 추출된 정적 특징 벡터와 동적 특징 벡터를 결합하여 특징 벡터 스트림을 구성하는 특징 벡터 결합부를 포함한다.