Abstract:
PURPOSE: An automatic correction apparatus of a Chinese structural postposition and a method thereof for automatically selecting a structural postposition by the conversion regulation of the structural postposition are provided to improve Chinese translation quality by efficiently correcting a Chinese sentence in an automatic translation system. CONSTITUTION: An error detection unit(112) detects the use error of a structural postposition by using the distribution regulation DB of a structural postposition by inputting a Chinese sentence. An error correction unit(122) performs the correction of the sentence with reference to the translation regulation DB of the structural postposition.
Abstract:
본 발명은 자동번역을 위한 영어 어휘 패턴을 구축하는데 있어서, 원시 코퍼스(source corpus)로부터 어휘패턴 후보를 자동으로 추출하고, 추출된 어휘패턴 후보에 대해 사람이 어휘패턴을 구축할 수 있도록 지원하여, 적은 노력으로 필요한 어휘패턴을 구축할 수 있도록 하는 자동번역을 위한 영어 어휘패턴 구축 기술에 관한 것이다. 본 발명은 원시 코퍼스에 대해 품사 태깅(tagging)을 수행하는 태깅 단계, 패턴범위 후보인식 단계, 필터링 단계, 패턴 저장 단계, 빈도수 조정 단계, 어휘 패턴 구축 지원 단계로 구성되는 것을 특징으로 한다. 어휘 패턴, 지식 구축, 기계번역, 구문분석
Abstract:
An automatic Korean/English translating method and an apparatus thereof including an automatic pattern base translating technique are provided to generate final translating result by accurately dividing a Korean sentence. A result of A morphological analysis and a phrase analysis are cut by two or more partial original sentences(S208). A pattern machine half machine translation bilingual document and a statistical machine translation translated sentence are generated based on the divided result(S210, S212). A partial translated sentence about the Korean original sentence is synthesized to one English sentence by using translation result(S218).
Abstract:
A method for constructing English vocabulary patterns in an automatic translation system and an apparatus thereof are provided to increase the performance of sentence structure analysis/automatic translation by utilizing the vocabulary patterns in the automatic translation. The tagging for part of speech is performed for inputted corpuses(S202). By using the tagged part of speech and the original form of words, pattern range candidates which can become a vocabulary pattern are recognized(S204). The filtering is performed for the extracted pattern(S206). A pattern generated through the filtering process is stored together with frequency information(S208). The frequency information is adjusted(S210). The final vocabulary pattern candidates are extracted(S212).
Abstract:
A real-time interactive machine translation apparatus and a method thereof are provided to estimate the translation error based on information generated during the translation process, and offer the re-translated result in which the error is modified by a user in real time. A machine translation engine(210) comprises a morphological/syntactic analyzer(211) and a translation generator(212). The morphological/syntactic analyzer analyzes the morpheme and syntax of the original, and the translation generator produces a translation based on the analyzed result. An original text error estimator(220) receives the morpheme and sentence analysis result of the original from the machine translation engine, and determines an original text error estimated portion having the possibility of translation error within the original text. The translation error estimator(230) receives the generated translation information from the machine translation engine and determines a translation error estimated portion having the possibility of translation error within the translation.
Abstract:
A device and a method for segmenting an English sentence are provided to improve translation accuracy of machine translation and build a full text database from a simple English raw corpus for the machine translation for an English patent document. An input processor(100) segments paragraphs from an inputted English patent document. A token segmentation part(200) segments each word included in the paragraph into a token and sets a type of the token. A sentence segmentation part(300) segments a patent sentence by using the segmented token and the token type as input for an abbreviation database(610) and a proper noun database(620). A sentence segmentation knowledge builder(700) builds the abbreviation database and the proper noun database from a patent document raw corpus automatically. A sentence transformer(400) transforms an asyntactic patent sentence segmented in the sentence segmentation part by using a sentence transformation rule database(630). An output processor(500) outputs the segmented and transformed patent sentence as a result.
Abstract:
A statistical HMM(Hidden Markov Model) part-of-speech tagging apparatus and method capable of being applied to a new domain without a tagged domain corpus are provided to select a lexicon with lexical probability varied according to a domain to which the lexicon is applied, and update the lexical probability according to the domain to improve tagging accuracy without having a tagged domain corpus in a specific domain. Tagging probability information is learnt from a previously tagged corpus to construct a lexical/part-of-speech/contextual probability information database and a lexical probability information database(S210). The lexical probability information database is domain-dependently leant and updated based on a raw corpus of an application domain(S220). Morpheme analysis is performed on an input sentence on the basis of a morpheme analysis dictionary database(S240). Statistical part-of-speech tagging is carried out on the morpheme analysis result based on the lexical/part-of-speech/contextual probability information database and the updated lexical probability information database(S250). An error in the tagging result is corrected according to a tagging error correction rule database(S260).
Abstract:
A method and a device for automatically generating a compound noun translation using translation co-occurrence/probability information of a translation dictionary are provided to solve semantic disambiguation and synonymous translations by automatically extracting the translation co-occurrence/probability information from the dictionary and selecting the translation based on the extracted information. A translation co-occurrence and probability information extractor(107,108) respectively extracts the translation co-occurrence and probability information from the translation database(106). A compound noun extractor(102) extracts and dissolves the compound noun into words of a noun unit. A context-based translation selector(103) selects the highest context probability translation for each word based on the translation co-occurrence information. A probability-based translation selector(104) selects the highest probability translation for each word based on the translation probability information. A compound noun translation generator(105) generates the translation of the extracted compound noun by combining the selected translations.
Abstract:
본 발명에 의한 복합 명사 전문용어 사전 엔트리의 재분석 방법 및 그 장치는 전문용어 사전에서 단일 명사 전문용어와 복합 명사 전문용어를 분리하는 단계; 상기 복합 명사 전문용어에 소정의 품사를 가지는 단어를 부착하여 부분 문장을 생성하는 단계; 상기 단일명사 전문용어와 형태소 분석 기본 사전을 기초로 상기 부분문장의 형태소를 분석하는 단계; 및 상기 분석결과 상기 부분생성된 문장이 단일 명사 이외의 품사로 해석될 가능성의 유무로 상기 복합 명사의 등록 여부를 결정하는 단계;를 포함하는 것을 특징으로 하며, 형태소 분석 사전에 등재가 요구되는 복합 명사 전문용어 엔트리를 재분석하여 복합 명사 전문용어 삭제에 따른 분석 모호성 발생을 판단하고, 이에 따른 분석 사전 등재 대상 전문용어 엔트리를 선정하여 대용량 전문용어에 의해 크기가 커지는 분석 사전의 크기를 효과적으로 축소하면서 분석 정확률은 유지할 수 있는 시스템 효율성을 향상시키는 효과를 가져올 수 있다.
Abstract:
본 발명에 의한 복합 명사 전문용어 사전 엔트리의 재분석 방법 및 그 장치는 전문용어 사전에서 단일 명사 전문용어와 복합 명사 전문용어를 분리하는 단계; 상기 복합 명사 전문용어에 소정의 품사를 가지는 단어를 부착하여 부분 문장을 생성하는 단계; 상기 단일명사 전문용어와 형태소 분석 기본 사전을 기초로 상기 부분문장의 형태소를 분석하는 단계; 및 상기 분석결과 상기 부분생성된 문장이 단일 명사 이외의 품사로 해석될 가능성의 유무로 상기 복합 명사의 등록 여부를 결정하는 단계;를 포함하는 것을 특징으로 하며, 형태소 분석 사전에 등재가 요구되는 복합 명사 전문용어 엔트리를 재분석하여 복합 명사 전문용어 삭제에 따른 분석 모호성 발생을 판단하고, 이에 따른 분석 사전 등재 대상 전문용어 엔트리를 선정하여 대용량 전문용어에 의해 크기가 커지는 분석 사전의 크기를 효과적으로 축소하면서 분석 정확률은 유지할 수 있는 시스템 효율성을 향상시키는 효과를 가져올 수 있다.