-
公开(公告)号:KR101492016B1
公开(公告)日:2015-02-23
申请号:KR1020130027644
申请日:2013-03-15
Applicant: 한국과학기술원
Abstract: 본 발명의 문서 분석 방법은 복수의 문서를 소정 기준에 따라 복수의 집단으로 분류하는 단계; 상기 집단 각각에 대해서 의미 단어 집합을 추출하되, 해당 집단을 대표하는 정도를 나타내는 가중치와 함께 추출하는 단계; 상기 집단 각각의 의미 단어 집합으로부터 해당 집단의 독특성을 갖는 중요 단어 집합을 추출하는 단계; 및 상기 복수의 집단 각각에 대한 상기 중요 단어 집합 사이의 유사도를 측정함으로써 상기 복수의 집단 사이의 유사도를 추론하는 단계를 포함한다.
-
公开(公告)号:KR1020140112936A
公开(公告)日:2014-09-24
申请号:KR1020130027644
申请日:2013-03-15
Applicant: 한국과학기술원
CPC classification number: G06F17/2785 , G06F17/21
Abstract: A document analysis method according to the present invention includes a step of classifying a plurality of documents into a plurality of groups according to predetermined criteria; a step of extracting a meaning word set for each group and extracting a weighting together, which indicates the degree to represent the group; a step of extracting an important word set with the distinctiveness of the group from the meaning word set of each group; and a step of measuring the similarity between the important word sets of the respective groups and inferring the similarity of the groups.
Abstract translation: 根据本发明的文档分析方法包括根据预定标准将多个文档分类成多个组的步骤; 提取每个组的意义词集合并一起提取加权的步骤,其指示表示组的程度; 从群组的意义词组中提取具有独特性的重要单词集的步骤; 以及测量各组的重要单词组之间的相似度并推断组的相似性的步骤。
-
公开(公告)号:KR101399272B1
公开(公告)日:2014-05-27
申请号:KR1020130022697
申请日:2013-03-04
Applicant: 한국과학기술원
CPC classification number: G06F17/2705 , G06F17/21
Abstract: The present invention relates to a method for estimating document similarity, wherein the method comprises: a step of extracting multiple first representative words from a first group including at least one document, and extracting weighted values indicating the degree of representing the first group for each of the multiple first representative words together; a step of extracting multiple second representative words from a second group including at least one document, and extracting weighted values indicating the degree of representing the second group for each of the multiple second representative words; and a step of estimating the similarity between the first group and the second group by measuring the similarity between the multiple first representative words and the multiple second representative words.
Abstract translation: 本发明涉及一种用于估计文档相似度的方法,其中该方法包括:从包括至少一个文档的第一组提取多个第一代表性单词的步骤,以及提取表示第一组的程度的加权值, 多个第一代表单词在一起; 从包括至少一个文档的第二组提取多个第二代表性单词的步骤,以及提取表示所述多个第二代表单词中的每一个代表第二组的程度的加权值; 以及通过测量多个第一代表性单词和多个第二代表单词之间的相似度来估计第一组与第二组之间的相似度的步骤。
-
公开(公告)号:KR101413444B1
公开(公告)日:2014-07-01
申请号:KR1020130037441
申请日:2013-04-05
Applicant: 한국과학기술원
CPC classification number: G06F17/30011 , G06F17/21 , G06F17/2705
Abstract: A method for analyzing a document of the present invention comprises the steps of: collecting a plurality of reference documents for a standard document by tracking a reference; extracting a plurality of standard representative words from the standard document with a weight value which shows a degree of representativeness of the standard document for each standard representative word; extracting a plurality of reference representative words for each reference document with a weight value which shows a degree of representativeness of a corresponding reference document for each reference representative word; and inferring the similarity between the standard document and the corresponding reference document which represents the reference representative words by using the standard representative words and the reference representative words.
Abstract translation: 一种用于分析本发明的文档的方法包括以下步骤:通过跟踪参考来收集用于标准文档的多个参考文档; 从所述标准文档中提取多个标准代表单词,其权重值表示每个标准代表词的标准文档的代表性程度; 对于每个参考文献提取多个参考代表词,其权重值表示每个参考代表词的相应参考文献的代表性程度; 并且通过使用标准代表词和参考代表词来推断代表参考代表词的标准文档和相应参考文献之间的相似性。
-
-
-