Abstract:
A robust speech recognition system according to the present invention improves a sound source by using an MPDR beamformer in a pre-processing process, applies an HIVA learning algorithm to the composed signals of the improved sound source signals and noise signals, and extracts a feature vector of the sound source signals. The speech recognition system applies a non-holonomic constraint and a minimal distortion principle when performing the HIVA learning algorithm to minimize signal distortion and improve convergence of a non-mixing matrix. In addition, the speech recognition system checks for missing features in the learning process by using an improved sound source and a noise sound source and compensates for the same. By the aforementioned features, the robust speech recognition system provides a system resistant to noise on the basis of an independent vector analysis algorithm using harmonic frequency dependency. [Reference numerals] (200) Signal input unit;(210) Signal converting unit;(220) Pre-processing unit;(230) Sound source extracting unit;(246) Mask generating unit;(248) Loss property compensation output unit;(250) DCT converting unit;(260) Voice recognition unit;(AA,BB) Log unit
Abstract:
PURPOSE: A device and method for recognizing a voice are provided to easily recognize the voice with accurately identifying a juncture in which a speaker speaks. CONSTITUTION: A device for recognizing a voice includes an input part (110), a detecting part (150), a saliency map generating part (160), an information obtaining part (170), a voice recognizing part (180), and an output part (120). The input part is inputted with multiple photographed images and sound sources, and a user is included in the photographed images. The detecting part detects the lip regions of a user from the respective images. The saliency map generating part generates dynamic saliency maps for the lip regions. The information obtaining part obtains motion information for a lip using the dynamic saliency maps. The voice recognizing part recognizes a voice for the sound sources based on the motion information for the lip. The output part outputs a result from recognizing the voice. [Reference numerals] (110) Input part; (120) Extracting unit; (130) Storage unit; (140) Location determination unit; (160) Saliency map generating part; (170) Information obtaining part; (180) Voice recognizing part; (190) Control unit; (200) Face detecting unit; (300) Lips detecting unit
Abstract:
PURPOSE: A method for separating a blind source according to independent vector analysis by using a feed forward network and a device thereof are provided to resolve a problem according to independency between frequencies without heuristic technology. CONSTITUTION: An ST(Short-Time) Fourier transformer(100) converts mixed signals of a TD(Time-Domain) into mixed signals of an FD(Frequency-Domain). An FF unmixing filter network(104) separates the mixed signals of the FD into source signals. An inverse ST Fourier transformer(105) converts the separated source signals into source signals of the TD. An MPDR beam-former(102) receives the mixed signals of the FD from the ST Fourier transformer. The MPDR beam-former generates predetermined mixed signals of the FD. The MPDR beam-former provides the generated mixed signals of the FD to the FF unmixing filter network. [Reference numerals] (100) ST Fourier transformer; (102) MPDR beamformer; (104) FF unmixing filter network; (105) Reverse ST Fourier transformer
Abstract:
본 발명은 음향 채널 추정에 기반한 음원 위치 탐지 방법에 관한 것이다. 상기 음향 채널 추정에 기반한 음원 위치 탐지 방법은, (a) 적응 채널 필터를 초기화하는 단계; (b) 각 센서로부터 신호들을 입력받는 단계: (c) 각 센서로부터 입력된 신호들을 이전에 갱신된 적응 채널 필터들을 통과시킨 후, 상기 통과된 신호들 간의 차이에 따른 오류 신호를 검출하는 단계: (d) 상기 오류 신호를 이용하여 음원과 각 센서들 사이의 적응 채널 필터들을 다채널 최소 평균 제곱법에 의해 갱신하는 단계: (e) 갱신된 적응 채널 필터를 실제 채널 필터의 선험적 정보를 이용하여 최종 갱신하는 단계: (f) 적응 채널 필터로부터 직접경로의 시간 지연을 파악하고 센서간 시간 지연 차이로부터 음원 위치를 추정하는 단계: 를 구비하고, 상기 (e)단계에서 적응 채널 필터를 갱신할 때 음향 채널 특성을 적용한다. 상기 음향 채널 특성은 채널 필터 계수들이 '성김(sparsity)' 분포를 갖는 특성을 이용함으로써, 보다 더 정확하게 직접경로의 시간 지연을 추정할 수 있게 된다. 음원 위치 탐지, 음향 채널, 추정
Abstract:
본 발명에 따르는 클러스터 기반 손실 특징 복원 알고리즘을 위한 마스크 추정 방법은, 관찰신호를 입력받아 관심음원을 검출하는 단계; 상기 관찰신호와 상기 관심음원을 제공받아 주파수별 SIR을 산출하는 단계; 상기 주파수별 SIR를 토대로 주파수마다 상이한 문턱값을 가지는 이진 마스크를 추정하는 단계;를 구비함을 특징으로 한다.
Abstract:
The present invention relates to a method and a system for recognizing a voice using three-dimensional geometry information. The voice recognition system comprises a learning module and a recognizing module. The learning module generates a recognition unit using three-dimensional geometry information for study and three-dimensional features for study extracted from the information. The recognizing module applies the three-dimensional geometry information acquired from a physical target related or belonged to a voice or three-dimensional features extracted from the geometry information to the recognition unit, and conducts voice recognition. The method and the system for recognizing a voice according to the present invention, recognizes a voice using three-dimensional geometry information on lips, a part around the lips, or one or more arbitrary regions of a human body in speech. Also, final voice recognition is conducted by combining two-dimensional features and sound features, and three-dimensional features and sound features in speech, and by combining recognition result of the two-dimensional features or sound features, or recognition result of the three-dimensional geometry information or three-dimensional features in speech. Therefore, the accuracy of voice recognition is improved.