Abstract:
The present invention relates to an apparatus and method for improving Chinese word segmentation performance, and more particularly, an apparatus and method for improving word segmentation performance by processing word segmentation errors of Chinese by automatically recognizing an accurate boundary of a word from a sentence of another language, for example, English or Korean, of a parallel corpus of which a word boundary is clear in order to reduce unregistered word errors and ambiguity errors frequently appeared in a Chinese word segmenting device. According to the present invention, a limitation that errors are confirmed from the word segmenting device by consuming lots of manpower and time can be overcome by continuously extracting the unregistered word errors and ambiguity errors, which are difficult to process at the time of word segmentation of a Chinese sentence, through the parallel corpus and storing corrected word segmentation information.
Abstract:
According to one embodiment, an automatic learning-based artificial intelligent dialog system includes: a database in which personalized expression learning data including sentence expressions classified by personal intention and classification tags for the sentence expressions is stored; a learning device which analyzes the sentence expressions included in the personalized expression learning data at a morpheme level and learns personal profiling data expressions attached with the classification tags at a morpheme level; a language analysis unit which analyzes the currently-inputted dialog sentence at a morpheme level; an extraction unit which extracts user profiling data based on the analysis result at a morpheme level by the language analysis unit and the personal profiling data expressions; a personal history database in which the user profiling data is classified by personal preference and accumulated as personal profiling data; an intention analysis unit which determines the intention in the dialog sentence based on the analysis result at a morpheme level by the language analysis unit; and a response generation unit which determines a dialog flow of the dialog sentence based on the personal profiling data accumulated in the personal history database and generates a response sentence. [Reference numerals] (10) Database; (20) Learning unit; (30) Language analysis unit; (40) Extraction unit; (50) Database; (60) Intention analysis unit; (70) Response generation unit
Abstract:
PURPOSE: A parenthesis processing device in rule-based automatic translation and a method thereof are provided to improve the performance of automatic translation by processing sentences including parentheses and increasing an application range of an existing pattern. CONSTITUTION: A pattern expanding unit(200) adds a selective adverbial phase to an addible position to the expansion of the selective adverbial phase. A selective adverbial phase processing unit(350) adds an active chart to a pattern with the expansion of the selective adverbial phase to process a selective adverbial phase node. A parsing unit(300) performs chart parsing by using a processing result of the selective adverbial phase node. A converting unit(400) converts an input text by using a conversion pattern corresponding to a chart parsing result. A generating unit(500) generates a translation result text based on the conversion result to provide the same to a user. [Reference numerals] (100) Tagging unit; (150) Dictionary storage unit; (200) Pattern expanding unit; (250) Rule/pattern storage unit; (300) Parsing unit; (350) Selective adverbial phase processing unit; (400) Converting unit; (500) Generating unit; (AA) Input text; (BB) Translation result text
Abstract:
PURPOSE: An apparatus and a method for measuring translation reliability of a multi-language model are provided to measure translation reliability by calculating reliability of an automatic translation result based on probability of language models. CONSTITUTION: A vocabulary based determining unit(110) calculates the probability of a vocabulary information language mode about a translation from a machine translation result. A content word based determining unit(130) calculates the probability of content information language model about the translation from the machine translation result. A reliability calculating unit(140) calculates the translation reliability of the translation by using the probability of the language model.
Abstract:
PURPOSE: A method and an apparatus for learning translation knowledge by phrase are provided to reliably and accurately extend translation knowledge since a user corrects the translation knowledge by noun phrase. CONSTITUTION: Syntax analysis for source language among language corpus of two countries is performed(110). Two countries language corpus is arranged with a word as a unit(120). A target language noun phrase candidate is extracted from two countries language corpus arranged with a word as a unit(130). A noun phrase translation knowledge construction candidate is selected by filtering the target language noun phrase candidates(140). The noun phrase translation knowledge is collected based on the machine translation result of the source language(160). The noun phrase pattern candidate is selected by searching the generalized pattern from pattern database(180). The result of correction or selection by a user is saved in the corresponding database(200).
Abstract:
본 발명은 주제를 대표할 수 있는 신조어인지 여부에 따라 우선순위를 결정하여 신조어를 선정한다는 것으로, 이를 위하여 본 발명은, 다양한 유형에 따른 신조어를 모두 추출하거나 전체 말뭉치를 이용하여 신조어를 선정하거나 선정된 신조어 후보를 사람이 직접 일일이 검토하여 신조어를 선정하는 종래 방법과는 달리, 신조어 후보에 대한 키워드를 추출하고, 추출된 키워드에 대한 주제를 탐지 및 추적한 후 그 주제의 대표어가 될 수 있는지 여부에 따라 우선 순위를 결정한 후에, 결정된 우선 순위에 따라 신조어를 선정함으로써, 신조어 선정을 효과적으로 수행할 수 있는 것이다. 자동 번역 시스템, 신조어(neologism)
Abstract:
PURPOSE: An apparatus for extracting vocabulary pattern including a syntactic node and a method thereof are provided to effectively extract the vocabulary pattern suited for grammar units by extracting vocabulary patterns including the syntactic node from massive text based on statistical and linguistic means. CONSTITUTION: Sentences which exceeds a frequency threshold is removed from a text document(202). Sentences which does not exceed the frequency threshold is tagged(203). A lexical pattern is generated about the tagged sentences, and the frequency of the vocabulary patterns is calculated(204). The vocabulary patterns are filtered. An illustrative sentence about the vocabulary patterns is added(206). The vocabulary patterns are outputted according to the priority(207).
Abstract:
PURPOSE: A compound noun decision apparatus and method thereof are provided to determine the sphere range of a compound noun by determining a semantic relation between nouns according to a semantic relation within a sentence. CONSTITUTION: A noun range recognition unit(102) selects a phrase binding target noun according to morpheme analysis result of the inputted sentence. A semantic relation decision unit(106) uses a semantic constraint condition and analyzes semantic relation between noun and verb. A noun range decision unit(110) determines phrase range according to a decision result.