-
公开(公告)号:KR1020080039009A
公开(公告)日:2008-05-07
申请号:KR1020060106635
申请日:2006-10-31
Applicant: 포항공과대학교 산학협력단
CPC classification number: G06F17/273 , G06F17/2715
Abstract: A device and a method for correcting incorrect space and spell of a word at the same time by using a syllable n-gram are provided to correct the incorrect space and spell of the word at the same time by forming a syllable n-gram language model from a corpus excluding an error, extracting a grapheme unit conversion probability and a syllable conversion pattern, generating a grapheme and syllable unit candidate for the corpus to be corrected, and finding an optimal path with the formed language model. A syllable n-gram database builder(10) builds a syllable n-gram database(S2) by extracting a syllable n-gram from a refined corpus database(S1). A grapheme unit/syllable conversion database builder(20) builds a grapheme unit conversion probability database(S5) and a syllable conversion pattern database(S6) by extracting a grapheme unit conversion probability and a syllable conversion pattern from an error-included corpus(S3) and a corrected corpus(S4). A grapheme dividing/candidate generating part(30) generates a candidate by separating an input sentence into graphemes and searching the grapheme from the grapheme unit conversion probability database and the syllable conversion pattern database. An optimal path estimator(40) estimates an optimal path for the generated candidate by using output of the syllable n-gram database.
Abstract translation: 提供了一种用于通过使用音节n-gram同时校正单词的不正确空间和拼写的装置和方法,以通过形成音节n语言模型来同时纠正单词的不正确空间和拼写 从语料库中排除错误,提取字母单位转换概率和音节转换模式,生成用于要校正的语料库的字形和音节单位候选,并使用形成的语言模型找到最佳路径。 音节n-gram数据库构建器(10)通过从精炼的语料库数据库(S1)中提取音节n-gram来构建音节n-gram数据库(S2)。 字形单元/音节转换数据库构建器(20)通过从包括错误的语料库(S3(S))中提取字母单位转换概率和音节转换模式,构建字母单位转换概率数据库(S5)和音节转换模式数据库(S6) )和校正语料库(S4)。 字母分割/候选生成部(30)通过将输入的句子分离成字形来生成候选,并从字母单位转换概率数据库和音节转换模式数据库中搜索字母。 最优路径估计器(40)通过使用音节n-gram数据库的输出来估计所生成的候选者的最优路径。