Invention Grant
- Patent Title: Methods and systems for detecting duplicate document using document similarity measuring model based on deep learning
-
Application No.: US17119028Application Date: 2020-12-11
-
Publication No.: US11631270B2Publication Date: 2023-04-18
- Inventor: Sung Min Kim , Byeonghoon Han
- Applicant: NAVER CORPORATION
- Applicant Address: KR Gyeonggi-do
- Assignee: NAVER CORPORATION
- Current Assignee: NAVER CORPORATION
- Current Assignee Address: KR Gyeonggi-do
- Agency: Harness, Dickey & Pierce, P.L.C.
- Priority: KR10-2019-0164926 20191211
- Main IPC: G06V30/418
- IPC: G06V30/418 ; G06F16/93 ; G06K9/62 ; G06F40/194

Abstract:
Disclosed is a method and system, the method including extracting similar and dissimilar document pair sets from a document database, the similar document pair set including similar document pairs having a common attribute, and the dissimilar document pair set including dissimilar document pairs extracted randomly, calculating a mathematical similarity for each of the similar and dissimilar document pairs using a mathematical measure to obtain a first and second mathematical similarities, calculating a semantic similarity for each of the similar and dissimilar document pairs to obtain a first and second semantic similarities, the first semantic similarities being higher than the first mathematical similarities, and the second semantic similarities being lower than the second mathematical similarities, training a similarity model based on the similar and dissimilar document pairs, and the first and second semantic similarities to obtain a trained similarity model, and detecting a duplicate document using the trained similarity model.
Public/Granted literature
Information query