Abstract:
PURPOSE: A rule base syntax analyzing device and method thereof are provided to perform syntax analysis with a high processing performance and an efficiency of rule based method by processing ambiguity based on vocabulary dependence information from context tree attached corpus. CONSTITUTION: A rule bases parsing module(103) selects optimal context tree by performing syntax analyzing of input sentence based on syntax rule. A rule weight calculating module(105) calculates rule weight and provides the weight to a rule based parsing module by using vocabulary weight and rule probability of a rule applied about the input sentence.
Abstract:
PURPOSE: A post-processing knowledge generation apparatus is provided to improve translation performance by correcting faults on translation based on post-processing knowledge. CONSTITUTION: An original text extracting unit(204) extracts an original text from a parallel corpus. A machine translation part(206) machine-translates the original text and creates machine translation corpus. An auto arranging part(210) arranges the machine translation corpus and a correct translation corpus which is extracted from the parallel corpus based on statistics. An extracting unit(212) extracts text arranging information by the arranging result. A filter(214) amends the error of the text arranging information and creates post-processing knowledge.
Abstract:
본 발명은 자동 번역 시스템의 도메인 변화에 따른 대역어 사전의 특화 기법에 관한 것으로, 목표 도메인에 속하는 원시 언어 코퍼스와 목표 언어 코퍼스를 이용하여 공기 어휘를 추출하고, 이를 대역어 사전에 매핑시켜 대역어 후보를 추출하며, 이에 대한 대역 관계의 오류를 필터링한 후 대표 대역어를 결정하여 대역어 사전에 반영함으로써, 자동 번역 시스템의 대역어 사전을 자동으로 특화시킬 수 있어 이를 구축하는데 소요되는 비용을 절감할 수 있는 것이다. 자동 번역, 대역어 사전
Abstract:
PURPOSE: A statistical HMM part tagging apparatus and a method thereof for increasing the tagging performance of a morpheme part are provided to increase the accuracy of tagging about a document of a domain by using learning information. CONSTITUTION: A real time learning based statistical part tagging unit(103) utilizes vocabulary stored in a vocabulary probability information DB and context stored in a context probability information DB. A real time learning based tagging error correction unit(105) corrects an error through a tagging error modification DB. A real time document information learning unit(101) establishes real time context probability information DB after extracting context probability information.
Abstract:
PURPOSE: A neologism selection device and a method thereof are provided to determine priority of neologism candidate based on a theme and select neologism according to the priority. CONSTITUTION: A morpheme analyzer(102) performs morphological analysis about input web document. A keyword extractor(106) extracts a keyword corresponding to neologism candidate from analyzed sentence. A subject detecting and tracking unit(108) performs theme detection and theme tracking through the extracted keyword. According to a subject keyword weight of the keyword, a priority determination unit(110) selects neologism according to the priority after determination of priority.
Abstract:
PURPOSE: A method and a device for generating translation sentence in an automatic English-Korean translation system are provided to automatically generate prefinal-ending corresponding to information delivered from Korean to English verb by information which an auxiliary verb or a temporal adverbial phrase transfers to a verb. CONSTITUTION: An English morpheme analyzer(100) analyzes input English sentence in a morpheme unit. A structure analyzer(300) analyzes English structure. English transforming unit(400) changes construction analysis result into Korean structure. A Korean generator(500) generates the Korean structure in the Korean morpheme unit. A perfinal-ending processor(600) transfers Korean perfinal-ending information corresponding to information about the English auxiliary verb group to the Korean generator.
Abstract:
PURPOSE: An apparatus and a method for translation-error post-editing are provided to recognize an error of a final translation version automatically and modify the recognized error. CONSTITUTION: A correction word candidate generator(304) generates an error correction word candidate for estimated translation errors based on the original sentence analysis information of a translation system, and a correction word selector(306) selects the final correction word for the generated error correction word candidate by using an error specialization language model according to an error type. The correction word selector reflects the final correction word to a translation result, and corrects the errors.
Abstract:
본 발명은 번역기에서 사용할 번역 지식을 자동 구축하는 방법 및 장치에 관한 것이다. 본 발명은 소스언어 문장과 상기 소스언어 문장의 번역 문장에 대응하는 타겟언어 문장이 입력되면 상기 소스언어 문장 및 상기 타겟언어 문장의 각 형태소에 품사, 원형, 기본구내에서의 상대적 위치 정보 및 구문 정보를 부착하여 상기 소스언어 문장 및 타겟언어 문장을 변환한 후, 상기 변환한 소스언어 문장과 타겟언어 문장의 단어 정렬과 구문 정렬을 수행하고 상기 단어 정렬 결과와 상기 구문 정렬 결과에서 단어 구문 번역 지식, 이중언어 용언 하위범주 번역 지식 및 이중언어 문형 번역 지식을 추출한다. 대역사전, 단어 정렬, 구문 정렬, 의존관계 분석기, 단어/구문 대역 사전 구축, 이중언어 용언 하위범주 패턴 추출, 이중언어 문형 추출, 이중언어 단어/구문 클러스터
Abstract:
A method and an apparatus for creating a quantifier of Korean language are provided to generate quantifiers proper to a Korean syntactic structure in a combination word in which a number and a noun is combined when translating sentences through morphological and syntactic analysis in a machine translation. A quantifier extracting unit extracts quantifier information according to the information as to parts of speech for all sentences which constitute a Koran corpus, and a quantifier filtering part(114) filters the quantifier information. A feature value assigning unit(116) assigns only high frequent quantifier information to Korean target words of dictionary database in feature value. A quantifier generating unit(120) generates quantifiers based on the feature value associated with the quantifiers assigned to each target word, and a Korean corpus collecting unit(100) collects the large amount of Korean corpus and databases the collected Korean corpus as source corpus. A Korean morpheme analyzing and tagging unit(104) performs a morpheme analysis and tagging operation for the all sentences which constitutes the corpus and performs a database operation in Korean tagged corpus.