-
公开(公告)号:KR1020110066467A
公开(公告)日:2011-06-17
申请号:KR1020090123136
申请日:2009-12-11
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2854 , G06F17/00 , G06F17/27
Abstract: PURPOSE: A translation error after-treatment correcting method and device using a factored language model are provided to proceed translation of high quality by recognizing and correcting automatically an error of the final translation sentence which a machine translation system translates. CONSTITUTION: A factored language model is constructed from a target word corpus(S202). A reliability value of a word is calculated on the basis on the factored language model about an error pattern of words shown in a translation sentence(S204). If a reliability value of the word is less than a threshold, the word is recognized as an error. A candidate corrected word about the error is generated(S208). A reliability value of the word about the candidate corrected word is calculated. A word which the reliability value of the word is the maximum value is selected as a corrected word(S210).
Abstract translation: 目的:提供翻译错误后处理纠正方法和使用因子语言模型的设备,通过自动识别和纠正机器翻译系统翻译的最终翻译句的错误来进行高质量的翻译。 构成:从目标词语料库构建因子语言模型(S202)。 基于关于翻译句子中显示的单词的错误模式的因式语言模型来计算单词的可靠性值(S204)。 如果字的可靠性值小于阈值,则该字被识别为错误。 产生关于该错误的候补校正字(S208)。 计算关于候选校正字的单词的可靠性值。 选择字的可靠性值为最大值的字作为校正字(S210)。
-
公开(公告)号:KR1020110066466A
公开(公告)日:2011-06-17
申请号:KR1020090123135
申请日:2009-12-11
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2881 , G06F17/2836
Abstract: PURPOSE: A foreign language writing service and system are provided to improve foreign language ability while making a composition with a foreign language by using an automatic translation function and an error estimation/correction function. CONSTITUTION: A language input unit(100) supports a foreign/native language mixed sentence input of a learner. A native language translation unit(102) recognizes and translates a native language from a foreign/native language mixed sentence. A conjunction sentence completion unit(104) combines a native language translation result and a foreign language input part of the foreign/native language mixed sentence. A conjunction sentence output unit(106) outputs a conjunction sentence which the conjunction sentence completion unit combines. An error estimation unit(108) presumes an error of the conjunction sentence and outputs the error estimation result through the conjunction sentence output unit.
Abstract translation: 目的:提供外语写作服务和系统,通过使用自动翻译功能和错误估算/修正功能,提高外语能力,同时使外语组合。 构成:语言输入单元(100)支持学习者的外语/母语混合输入。 母语翻译单元(102)从外语/母语混合句子中识别和翻译母语。 连接句子完成单元(104)将母语翻译结果和外语/母语混合句的外语输入部分相结合。 连词输出单元(106)输出连字句完成单元组合的连词。 误差估计单元(108)假设连接语句的错误,并通过连接语句输出单元输出错误估计结果。
-
公开(公告)号:KR1020110057583A
公开(公告)日:2011-06-01
申请号:KR1020090114046
申请日:2009-11-24
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2863 , G06F17/271 , G06F17/2755 , G06F17/2785 , G06F17/2809 , G06F17/30616 , G06F17/30625
Abstract: PURPOSE: A method and apparatus for generating and analyzing predicate for Chinese-Korean machine translation are provided to generate accurate Korean sentence by dividing past and present tenses. CONSTITUTION: A Chinese morpheme analyzing unit(101) divides a Chinese word included in a Chinese sentence. A Chinese morpheme analyzing unit(102) analyzes Chinese sentence and generates a Chinese syntactic tree. A Chinese/Korean selecting unit(103) converts Chinese syntactic tree into Korean syntactic tree. A Korean generating unit(105) rearranges Korean words by using a Korean syntactic tree.
Abstract translation: 目的:提供一种用于生成和分析中韩机器翻译谓词的方法和装置,通过划分过去和现在的时态来产生准确的韩语句子。 规定:中国语素分析单位(101)划分中文句子中包含的中文单词。 汉语词素分析单元(102)分析汉语句子并生成中文句法树。 中文/韩文选择单元(103)将汉语句法树转换为韩语句法树。 韩国发电单元(105)使用韩国语法树重新排列韩语单词。
-
公开(公告)号:KR1020110050296A
公开(公告)日:2011-05-13
申请号:KR1020090107214
申请日:2009-11-06
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2827 , G06F17/30876
Abstract: PURPOSE: A parallel language corpus extracting system and method thereof are provided to automatically extract parallel corpus from a web document at predetermined interval and to reduce the database construction cost of parallel language corpus. CONSTITUTION: A text extractor(130) extracts a text from a translated document and an original document(20). A paragraph extractor(140) extracts a translated paragraph. The paragraph extractor extracts an original paragraph from the text of the original document. A sentence extractor(150) extracts a translated sentence and an original sentence. A corpus extractor(160) extracts a parallel corpus.
Abstract translation: 目的:提供一种并行语言语料库提取系统及其方法,以预定间隔自动从Web文档中提取并行语料库,并减少并行语言语料库的数据库构建成本。 规定:文本提取器(130)从翻译文档和原始文档(20)中提取文本。 段落提取器(140)提取翻译的段落。 段落提取器从原始文档的文本中提取原始段落。 句子提取器(150)提取翻译的句子和原始句子。 语料库提取器(160)提取平行语料库。
-
公开(公告)号:KR1020100073181A
公开(公告)日:2010-07-01
申请号:KR1020080131775
申请日:2008-12-22
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2863 , G06F17/2765 , G06F17/289
Abstract: PURPOSE: A substitute adverb generating device and a method thereof are provided to determine generation location of adverb within predicate phrase in a Chinese target auto translation and determine substitute language of the determined position, thereby solving the ambiguity on meaning. CONSTITUTION: A multi adverb pattern applying unit(102) recognizes multiple adverb within inputted predicate phrase. The multi adverb pattern applying unit selects substitute multi adverb. A negative type recognizing and generating unit(108) recognizes negative adverb. The negative type recognizing and generating unit selects substitute negative adverb. An adverb-predicate pattern applying unit(112) determines adverb generating location through adverb-predicate pattern. The adverb-predicate pattern applying unit selects substitute adverb.
Abstract translation: 目的:提供一种替代副词生成装置及其方法,用于确定中文目标自动翻译中谓词短语内副词的生成位置,并确定确定位置的替代语言,从而解决意义上的歧义。 构成:多副词模式应用单元(102)识别输入的谓词短语内的多个副词。 多副词模式应用单元选择替代多副词。 负型识别和生成单元(108)识别负副词。 负型识别和生成单元选择替代负副词。 副词谓词模式应用单元(112)通过副词谓词模式确定副词生成位置。 副词谓词模式应用单元选择替代副词。
-
公开(公告)号:KR1020100072388A
公开(公告)日:2010-07-01
申请号:KR1020080130781
申请日:2008-12-22
Applicant: 한국전자통신연구원
IPC: G06F17/40
CPC classification number: G06F17/3089 , G06F17/2705
Abstract: PURPOSE: A translation word extractor device is provided to collect a web news and extract a translation word about a neologism or foreign language within a bracket and quotation symbol form a collected web news, thereby constructing a translation dictionary using the extracted translation word. CONSTITUTION: A web news collection unit(102) collects RSS(Really Simple Syndication) news list in real time. The web news collection unit extracts a web news corresponding to the collected RSS news list. Based on bracket and quotation symbols of the extracted web news, a translation word extractor(104) separates the sentence. The translation word extractor extracts word boundary corresponding to the bracket through LCS(Longest Common Substring) algorithm. The translation word extractor extracts a translation pair according to the extracted word border.
Abstract translation: 目的:提供一种翻译单词提取装置,用于收集网络新闻,并从收集的网络新闻中提取一个括号内的新词或外语的翻译单词和引号,从而使用提取的翻译词来构建翻译词典。 规定:网络新闻采集单位(102)实时收集RSS(真正简单聚合)新闻列表。 网络新闻收集单元提取与收集的RSS新闻列表相对应的网络新闻。 基于提取的网络新闻的括号和引号,翻译单词提取器(104)分离句子。 翻译词提取器通过LCS(最长公共子串)算法提取与括号相对应的字边界。 翻译单词提取器根据提取的单词边界提取翻译对。
-
公开(公告)号:KR1020100072384A
公开(公告)日:2010-07-01
申请号:KR1020080130777
申请日:2008-12-22
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2705 , G06F17/18 , G06F17/2755 , G06F17/2863 , G06F17/289
Abstract: PURPOSE: A method for generating Korean connectives for Chinese-Korean machine translation and a device thereof are provide to generate clear vision of connectives in Korean point of view and generate Korean sentence, thereby generating high quality of Korean. CONSTITUTION: A Chinese morpheme analyzer(101) selects optimum morpheme part about Chinese word in a Chinese input sentence. A Chinese construction analyzer generates a Chinese construction tree. A Chinese-Korean converter(103) converts the Chinese construction tree into Korean construction tree. If an inter-short sentence logical connection mark is inexplicit, a connective ending determining unit(104) generates inter-short connection ending through connective knowledge DB(106). A Korean generator(105) generates natural Korean.
Abstract translation: 目的:韩国机器翻译生成韩文连词的方法及其设备,为朝鲜语提供连贯词的清晰视觉,产生韩语句子,从而产生韩国人的高品质。 构成:中文词素分析器(101)在中文输入句中选择中文词的最佳语素部分。 中国建筑分析仪生产中国建筑树。 中国 - 韩国转换器(103)将中国建筑树转换为韩国建筑树。 如果短语间逻辑连接标记是不明确的,则连接结束确定单元(104)生成通过连接知识DB(106)结束的短间连接。 韩国发电机(105)产生天然韩国人。
-
公开(公告)号:KR100956794B1
公开(公告)日:2010-05-11
申请号:KR1020080084626
申请日:2008-08-28
Applicant: 한국전자통신연구원
IPC: G06F17/28
CPC classification number: G06F17/2827 , G06F17/2775
Abstract: 본 발명은 다단계 용언구 패턴을 적용한 번역장치와 이를 위한 적용방법 및 추출방법에 관한 것으로, 다단계 용언구 패턴 매칭 기법을 적용하여 번역성능을 향상시키고, 번역장치를 위한 용언구 패턴을 다단계로 적용하며, 다단계 용언구 패턴을 자동으로 추출함으로써 고성능의 기계번역장치를 구축할 수 있다. 또한, 본 발명은 다단계 용언구 패턴을 적용한 번역장치와 이를 위한 적용방법 및 추출방법을 제공함으로써 원시언어에서 목적언어로 변환하는 데 사용되며 어휘적인 측면과 어순 등 언어 구조적인 측면에서 이종 언어간에 발생하는 중의성을 해소할 수 있다.
용언구 패턴, 다단계, 매칭, 적용-
公开(公告)号:KR100911621B1
公开(公告)日:2009-08-12
申请号:KR1020070133677
申请日:2007-12-18
Applicant: 한국전자통신연구원
IPC: G06F17/28
CPC classification number: G06F17/2872 , G06F17/2818 , G06F17/2863
Abstract: 본 발명은 패턴기반 자동번역(Pattern Based Machine Translation) 방식의 장점과 통계기반 자동번역(Statistical Machine Translation) 방식의 장점을 혼합한 하이브리드 자동번역 기술에 관한 것이다. 본 발명은, 형태소 분석기를 이용하여 한국어 문장에 대한 형태소 분석 결과를 생성하는 단계, 형태소 분석 결과를 입력으로 하고 구문분석기를 이용하여 구문분석 결과를 생성하는 단계, 원문부 번역 매니저를 이용하여 원문의 분석 결과를 보정하는 단계, 원문부 번역 매니저 내에서 문장 분절을 수행하는 단계, 원문부 번역 매니저 내에서 문형 매칭을 수행하는 단계, 원문부 번역 매니저 내에서 패러프레이징(Paraphrasing)을 수행하는 단계, PBMT 생성기에서 번역 결과를 생성하는 단계, PBMT 생성기에서 SMT 번역 결과를 호출하는 단계, SMT에서 보정된 원문 분석 결과를 이용해 번역 결과를 생성하는 단계, 대역문 번역 매니저에서 최종 번역 결과를 생성하는 단계, 대역문 합성기에서 PBMT 및 SMT 번역 결과를 이용하여 최종 대역문 후보를 생성하는 단계, 대역문 합성기에서 생성한 대역문 후보들에 대해 가장 적절한 대역문 결과를 평가하여 선정하는 단계를 포함한다. 본 발명에 의하면, 첫째, 한국어 문장을 정확하게 분절할 수 있으며, 둘째, 분절을 통해 번역 속도를 향상할 수 있으며, 셋째, 분절을 통해 번역 성능을 향상시킬 수 있으며, 넷째, 입력문에 대한 패러프레이징을 수행함으로써 분석 및 번역 성능을 개선시킬 수 있고, 다섯째, 대역문 선택기를 개발함으로써 보다 우수한 번역 결과를 최종적으로 생성할 수 있다.
통계기반 자동번역, 패턴기반 자동번역, 패러프레이징, 문장 분절, 대역문 선택-
公开(公告)号:KR100886687B1
公开(公告)日:2009-03-04
申请号:KR1020070129360
申请日:2007-12-12
Applicant: 한국전자통신연구원
CPC classification number: G06F17/2863 , G06F17/218 , G06F17/2247 , G06F17/2715 , G06F17/2755 , G06F17/277
Abstract: A method and an apparatus for auto-detecting an unregistered word in Chinese are provided to extract unregistered words from a web-document which is a translation target document by using HTML tag information, statistic information, monosyllable token information, etc. A removing unit(102) removes an HTML tag of an inputted web-document when receiving a web-document which includes Chinese sentences, and a tag classification unit(104) classifies each sentence in the document based on a meta tag and general tag processing manner. An extracting unit(106) using a general tag includes: a monosyllable based extracting module(116) extracts unregistered words on the basis of monosyllable token; and a verb based extracting module(118) extracts unregistered verb words which consist of 4 syllables. An extracting unit(108) using a meta tag extracts an unregistered word by using a word included in meta tag information, and a morpheme analyzing unit(110) analyzes morphemes and outputs the analyzed results. A radix based extracting module(114) extracts an unregistered word based on radixes by using the analyzed results.
Abstract translation: 提供了一种用于自动检测中文的未注册单词的方法和装置,用于通过使用HTML标签信息,统计信息,单音节令牌信息等从作为翻译目标文档的web文档中提取未注册的单词。 102)在接收到包括中文句子的网络文档时,移除输入的web文档的HTML标签,标签分类单元(104)基于元标签和通用标签处理方式对文档中的每个句子进行分类。 使用一般标签的提取单元(106)包括:基于单音节提取模块(116)基于单音节令牌提取未注册的单词; 并且基于动词的提取模块(118)提取由4个音节组成的未注册的动词。 使用元标签的提取单元(108)通过使用元标签信息中包含的单词提取未注册的单词,并且语素分析单元(110)分析语素并输出分析结果。 基于基数的提取模块(114)通过使用分析结果基于基数提取未注册的单词。
-
-
-
-
-
-
-
-
-