Method and system for detecting duplicate document using vector quantization
Abstract:
Disclosed is a method and system for detecting a duplicate document using vector quantization. A duplicate document detection method may include acquiring, by processing circuitry, a respective vector expression for each of a plurality of documents using a similarity model, the similarity model being trained to output similar vector expressions for semantically similar documents, generating a key by performing a vector quantization on the respective vector expression, the key including a binary character string, and detecting a duplicate document from among the plurality of documents using the key.
Information query
Patent Agency Ranking
0/0