Abstract:
PURPOSE: An automatic translation system based on TM and a method thereof, capable of increasing coverage by TM are provided to increase the quality of translation by changing TM consisting of a character string into a configured TM. CONSTITUTION: A TM building module(106) converts a language pattern into a partial translation pattern and registers the partial translation pattern into TM database. A partial combination translation module(20) analyzes the structure of a language pattern with reference to the TM database and searches the partial translation pattern. The partial combination translation module combines the partial translation pattern and outputs a translation corresponding to an input statement.
Abstract:
PURPOSE: An automatic correction apparatus of a Chinese structural postposition and a method thereof for automatically selecting a structural postposition by the conversion regulation of the structural postposition are provided to improve Chinese translation quality by efficiently correcting a Chinese sentence in an automatic translation system. CONSTITUTION: An error detection unit(112) detects the use error of a structural postposition by using the distribution regulation DB of a structural postposition by inputting a Chinese sentence. An error correction unit(122) performs the correction of the sentence with reference to the translation regulation DB of the structural postposition.
Abstract:
본 발명은 자동번역을 위한 영어 어휘 패턴을 구축하는데 있어서, 원시 코퍼스(source corpus)로부터 어휘패턴 후보를 자동으로 추출하고, 추출된 어휘패턴 후보에 대해 사람이 어휘패턴을 구축할 수 있도록 지원하여, 적은 노력으로 필요한 어휘패턴을 구축할 수 있도록 하는 자동번역을 위한 영어 어휘패턴 구축 기술에 관한 것이다. 본 발명은 원시 코퍼스에 대해 품사 태깅(tagging)을 수행하는 태깅 단계, 패턴범위 후보인식 단계, 필터링 단계, 패턴 저장 단계, 빈도수 조정 단계, 어휘 패턴 구축 지원 단계로 구성되는 것을 특징으로 한다. 어휘 패턴, 지식 구축, 기계번역, 구문분석
Abstract:
An automatic Korean/English translating method and an apparatus thereof including an automatic pattern base translating technique are provided to generate final translating result by accurately dividing a Korean sentence. A result of A morphological analysis and a phrase analysis are cut by two or more partial original sentences(S208). A pattern machine half machine translation bilingual document and a statistical machine translation translated sentence are generated based on the divided result(S210, S212). A partial translated sentence about the Korean original sentence is synthesized to one English sentence by using translation result(S218).
Abstract:
A method for constructing English vocabulary patterns in an automatic translation system and an apparatus thereof are provided to increase the performance of sentence structure analysis/automatic translation by utilizing the vocabulary patterns in the automatic translation. The tagging for part of speech is performed for inputted corpuses(S202). By using the tagged part of speech and the original form of words, pattern range candidates which can become a vocabulary pattern are recognized(S204). The filtering is performed for the extracted pattern(S206). A pattern generated through the filtering process is stored together with frequency information(S208). The frequency information is adjusted(S210). The final vocabulary pattern candidates are extracted(S212).
Abstract:
A real-time interactive machine translation apparatus and a method thereof are provided to estimate the translation error based on information generated during the translation process, and offer the re-translated result in which the error is modified by a user in real time. A machine translation engine(210) comprises a morphological/syntactic analyzer(211) and a translation generator(212). The morphological/syntactic analyzer analyzes the morpheme and syntax of the original, and the translation generator produces a translation based on the analyzed result. An original text error estimator(220) receives the morpheme and sentence analysis result of the original from the machine translation engine, and determines an original text error estimated portion having the possibility of translation error within the original text. The translation error estimator(230) receives the generated translation information from the machine translation engine and determines a translation error estimated portion having the possibility of translation error within the translation.
Abstract:
A device and a method for segmenting an English sentence are provided to improve translation accuracy of machine translation and build a full text database from a simple English raw corpus for the machine translation for an English patent document. An input processor(100) segments paragraphs from an inputted English patent document. A token segmentation part(200) segments each word included in the paragraph into a token and sets a type of the token. A sentence segmentation part(300) segments a patent sentence by using the segmented token and the token type as input for an abbreviation database(610) and a proper noun database(620). A sentence segmentation knowledge builder(700) builds the abbreviation database and the proper noun database from a patent document raw corpus automatically. A sentence transformer(400) transforms an asyntactic patent sentence segmented in the sentence segmentation part by using a sentence transformation rule database(630). An output processor(500) outputs the segmented and transformed patent sentence as a result.
Abstract:
본 발명은 본 발명은 한국어를 원문으로 하는 특허 문서에서 빈번히 등장하는 전문용어의 대역어 선정을 위해 구축되는 대역어 사전의 정보를 자동으로 생성하여 제시함으로써 수동으로 구축되던 대역어 사전의 구축 작업을 반자동화하여 대역어 사전 구축의 효율성을 높이기 위한 장치 및 방법에 관한 것으로, 특허문서에서 전문용어를 구성하는 단위 명사 및 접사의 대역어 정보를 이용하여 복합명사형 전문용어 대상 엔트리와 대역어를 추출하는 단계와, 상기 추출된 복합명사형 전문용어 대상 엔트리 및 대역어에서 미등록 단일명사 전문용어의 대역어 후보자를 선정하는 단계와, 상기 대역어 후보자가 없는 경우에 수동 구축을 위해 해당 전문용어의 예문을 추출하여 제시하는 단계를 포함하여 이루어지는데 있다. 자동번역, 전문용어 추출, 특허 문서 번역, 대역어 선정
Abstract:
A statistical HMM(Hidden Markov Model) part-of-speech tagging apparatus and method capable of being applied to a new domain without a tagged domain corpus are provided to select a lexicon with lexical probability varied according to a domain to which the lexicon is applied, and update the lexical probability according to the domain to improve tagging accuracy without having a tagged domain corpus in a specific domain. Tagging probability information is learnt from a previously tagged corpus to construct a lexical/part-of-speech/contextual probability information database and a lexical probability information database(S210). The lexical probability information database is domain-dependently leant and updated based on a raw corpus of an application domain(S220). Morpheme analysis is performed on an input sentence on the basis of a morpheme analysis dictionary database(S240). Statistical part-of-speech tagging is carried out on the morpheme analysis result based on the lexical/part-of-speech/contextual probability information database and the updated lexical probability information database(S250). An error in the tagging result is corrected according to a tagging error correction rule database(S260).
Abstract:
A method and a device for automatically generating a compound noun translation using translation co-occurrence/probability information of a translation dictionary are provided to solve semantic disambiguation and synonymous translations by automatically extracting the translation co-occurrence/probability information from the dictionary and selecting the translation based on the extracted information. A translation co-occurrence and probability information extractor(107,108) respectively extracts the translation co-occurrence and probability information from the translation database(106). A compound noun extractor(102) extracts and dissolves the compound noun into words of a noun unit. A context-based translation selector(103) selects the highest context probability translation for each word based on the translation co-occurrence information. A probability-based translation selector(104) selects the highest probability translation for each word based on the translation probability information. A compound noun translation generator(105) generates the translation of the extracted compound noun by combining the selected translations.