Invention Grant
- Patent Title: Method and system for detecting duplicate document using vector quantization
-
Application No.: US17120693Application Date: 2020-12-14
-
Publication No.: US11550996B2Publication Date: 2023-01-10
- Inventor: Sung Min Kim , Byeonghoon Han
- Applicant: NAVER CORPORATION
- Applicant Address: KR Gyeonggi-do
- Assignee: NAVER CORPORATION
- Current Assignee: NAVER CORPORATION
- Current Assignee Address: KR Gyeonggi-do
- Agency: Harness, Dickey & Pierce, P.L.C.
- Priority: KR10-2019-0169132 20191217
- Main IPC: G06F40/194
- IPC: G06F40/194 ; G06F40/289 ; G06F40/30

Abstract:
Disclosed is a method and system for detecting a duplicate document using vector quantization. A duplicate document detection method may include acquiring, by processing circuitry, a respective vector expression for each of a plurality of documents using a similarity model, the similarity model being trained to output similar vector expressions for semantically similar documents, generating a key by performing a vector quantization on the respective vector expression, the key including a binary character string, and detecting a duplicate document from among the plurality of documents using the key.
Public/Granted literature
- US20210182479A1 METHOD AND SYSTEM FOR DETECTING DUPLICATE DOCUMENT USING VECTOR QUANTIZATION Public/Granted day:2021-06-17
Information query