Abstract:
외부로부터 입력되는 제1국어 음성을 음향학적 특징 및 미리 결정된 음향 모델에 따라서 분석하여 확률 높은 단어열들을 추론하는 음향 모델 분석부, 음향 모델 분석부에 추론된 단어열들이 연속적으로 나타날 수 있는 확률을 미리 결정된 제1국어 언어 모델에 상응하여 분석하여 확률 높은 단어열들을 추론하는 제1국어 언어 모델 분석부, 제1국어 언어 모델 분석부에서 추론된 단어열들이 통계적으로 제2국어 단어열로 번역되는 확률을 미리 결정된 통계적 모델에 상응하여 분석하여 확률 높은 단어열들을 추론하는 통계적 모델 분석부, 통계적 모델 분석부에서 추론된 단어열들이 연속적으로 나타날 수 있는 확률을 미리 결정된 제2국어 언어 모델에 상응하여 분석하여 확률 높은 단어열들을 추론하는 제2국어 언어 모델 분석부 및 음향 모델 분석부, 제1국어 언어 모델 분석부, 통계적 모델 분석부 및 제2국어 언어 모델 분석부에서 추론된 단어열들의 확률을 통합하여 가장 확률 높은 단어열을 결정하는 최종 번역부를 포함하는 자동 번역 장치를 제공할 수 있다. 자동, 번역, 음성 인식
Abstract:
A method and an apparatus for recognizing a distributed speech by using a phoneme are provided to recognize a phoneme in a terminal and recognize a speech with regard to a phoneme recognition result, thereby realizing information search through speech recognition without realizing additional communication standards. When a natural language query is inputted to a terminal, a phoneme of the natural language query is recognized(401). The terminal sends the recognized phoneme signal to a speech recognition server(403). The speech recognition server performs the speech recognition by using the phoneme signal(405,407). The speech recognition server confirms a search identifier to be sent to the search sever(409). The speech recognition server sends a search query signal by using the confirmed search identifier(411). The search server sends a query response signal in response to the search query signal(413).
Abstract:
A method and an apparatus for analyzing an adaptive speaking screen are provided to determine a speaking screen through screen analysis and update a reference feature value to a feature value and a reference edge feature value by the determined speaking screen, thereby analyzing the speaking screen adaptively according to environment. A method for analyzing an adaptive speaking screen comprises the following steps of: receiving the speaking screen of a speaker from the outside(501); selecting at least one moving region in the received speaking screen of the speaker(503); extracting the contrast ratio distributed value and feature value of the at least one moving region respectively(505); comparing the respectively extracted contrast ratio distributed value and feature value with a preset reference feature value to select at least one moving region having the contrast ratio distributed value and feature value corresponding to the preset reference feature value(507); determining a region, which exists in an upper part as much as a predetermined position in the at least one selected moving region, as a comparison target candidate region(509); selecting a speaking region matching a preset reference screen more than a threshold value out of comparison target candidate regions(511,515); extracting the feature value and contrast ratio of the speaking region(519); and updating the extracted feature value and contrast ratio to the preset reference feature value(521).
Abstract:
본 발명은 고빈도 의사형태소열을 하나의 인식단위로 활용하여, 의사형태소와 어절의 중간형태의 인식단위를 생성하도록 하는 대화체 및 낭독체 대어휘 연속음성인식시스템의 고빈도 어휘열 인식단위 생성장치 및 그 방법에 관한 것이다. 이와 같은 본 발명은 의사형태소 태깅된 텍스트 코퍼스로부터 연속된 어휘쌍 빈도정보를 추출하는 빈도정보 추출부(301)와, 상기 빈도정보 추출부(301)에서 추출된 빈도정보와 상기 각 어휘쌍의 길이정보을 바탕으로 결합할 어휘셋을 선정하는 결합 어휘셋 선정부(302)와, 상기 결합 어휘셋 선정부(302)에서 선정된 어휘셋을 기반으로 상기 텍스트 코퍼스를 수정한 후, 고빈도 연속 어휘쌍을 하나로 결합하여 수정된 텍스트 코퍼스를 생성하는 의사형태소 결합 정보 수정부(303)와, 상기 의사형태소 결합 정보 수정부(303)에서 생성된 텍스트 코퍼스를 바탕으로 고빈도 어휘열 인식단위를 생성하는 인식단위 생성부(304)로 구성된다. 대화체 및 낭독체 대어휘, 텍스트 코퍼스, 어휘사전, 언어모델, 발음사전
Abstract:
본 발명은 잡음환경에서 강인한 음성인식을 위해 신경망을 기반으로 음성과 영상정보를 효율적으로 융합하고, 이동단말기에서의 명령어 사용패턴인 문맥정보와 후처리 방법을 사용하여 음성, 영상 및 문맥에 대한 통합 인식을 수행함으로써 음성 인식률을 보다 향상시킬 수 있는 신경망에 기반한 음성인식 장치 및 방법에 관한 것이다. 본 발명의 통합 음성인식 방법은, 입력되는 음성 및 영상 신호로부터 특징 벡터를 추출하는 특징 추출단계; 음성 및 영상 정보를 신경망을 기반으로 융합하여 사용자 음성을 인식하는 이중모드 신경망 인식 단계; 이동 단말기에서의 사용자 명령어 패턴을 인식하는 문맥정보 인식 단계; 및 이중모드 신경망 인식 결과와 문맥정보 인식 결과를 통합하여 최종 인식결과를 출력하는 후처리 단계;로 이루어진다. 음성 인식, 이중모드 인식, 신경망 인식기, BMNN, 역전파 학습알고리즘, 문맥정보 인식
Abstract:
PURPOSE: A stereo image display apparatus and method is provided to prevent eye fatigue by permitting the user to gaze the desired point and change the gaze point in a free manner. CONSTITUTION: A stereo image display apparatus comprises a three-dimensional model storage unit(11) for creating and storing a three-dimensional model for an object to be displayed in a virtual reality space; a head and eye movement detection unit(16) for detecting the position of head(face) of a user and extracting images of eyes of the user; a gaze direction and distance measurement unit(12) for extracting information for the current gaze point of the user, from the position of head and images of eyes output from the head and eye movement detection unit; an image creating unit(13) for generating the stereo image corresponding to the current gaze point extracted from the gaze direction and distance measurement unit, on the basis of the three-dimensional model of the object stored in the three-dimensional model storage unit; and display units(14,15) for displaying left and right side images created by the image creating unit.
Abstract:
PURPOSE: An apparatus and a method for setting signal tone using a sound and a voice are provided to enable a user to actively generate a desired signal tone by a need or a user's preference to eliminate a confused signal tone problem. CONSTITUTION: A frequency analyzing part and an energy analyzing part(12) analyze a frequency element and an energy element for a melody signal received from a microphone(11) and converted into a digital format. An audio recognizing part, a tone recognizing part and a rhythm recognizing part(13) recognize a voice, a tone and a rhythm for the inputted melody. A basic melody generating part(14) generates the melody passed through the recognizing parts as a basic melody. A rhythm adding part(16) and a tone color and melody adding part(17) add a rhythm, and a tone color and melody specified by a user to the basic melody. A signal tone generating and storing part(19) generates and stores a resulting signal tone.
Abstract:
PURPOSE: An automatic word spacing method of Korean using syllable unit condition probability is provided to process a word spacing method with respect to a sentence prepared based on partial spacing words and a sentence having no space by using a statistical method instead of a vocabulary knowledge or a heuristic. CONSTITUTION: A hypothesis for a spacing words optimum pattern search is set(400). The maximum accumulated log probability is calculated based on the set hypothesis(402). An output string is obtained by searching a spacing words optimum pattern of a syllable inputted using the maximum accumulated log probability and the back pointer(404). In the hypothesis process, a space is generated when a transient is generated as the same state, and a syllable is generated when a transient is generated as a different state. One hypothesis has the latest "n-1" number syllable, an accumulated log probability, and a back pointer. The back pointer is used for sensing the previous hypothesis extracting the current hypothesis, and stores a time, a status and a pointer of the previous hypothesis.
Abstract:
PURPOSE: A voice language translation system using network and a method therefor are provided to naturally communicate to counter part using user's own language. CONSTITUTION: A voice language translation system using network comprises an input unit(11), a voice recognition unit(12), an intermediate language generator(13), a language generator(14), a network call processor(15) and a user interface controller(18). The input unit receives voice signals from user to transfer them to the user interface controller. The voice recognition unit recognizes the voice signals from the user interface controller to convert them into character type sentence and then transfer them to the user interface controller. The intermediate language generator converts the character type sentence into a sentence of semantics structure. The language generator translate the sentence of semantics structure into prescribed language. The network call processor processes call connection and disconnection and data communication. The user interface controller controls all components.
Abstract:
PURPOSE: A frame compression method using representative characteristic column and an audio recognition method using the same are provided to reduce the number of frame without degradation of performance by obtaining a representative frame of similar frames after extracting a characteristic column from frames of constant time interval. CONSTITUTION: A frame compression method using representative characteristic column comprises steps of: dividing compression signal by prescribed time interval into a frame; extracting a characteristic column about the divided frame; obtaining the similarity between the extracted characteristic columns; obtaining a representative characteristic column of similar characteristic columns using the obtained similarity.