Invention Grant
- Patent Title: Identifying similar documents in a file repository using unique document signatures
-
Application No.: US18165489Application Date: 2023-02-07
-
Publication No.: US12086193B2Publication Date: 2024-09-10
- Inventor: Madan Avadhani , Swapnil Sharma
- Applicant: OneTrust LLC
- Applicant Address: US GA Atlanta
- Assignee: OneTrust LLC
- Current Assignee: OneTrust LLC
- Current Assignee Address: US GA Atlanta
- Agency: Keller Preece PLLC
- Main IPC: G06F16/93
- IPC: G06F16/93 ; G06F16/31 ; G06F16/35 ; G06F40/284

Abstract:
Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.
Public/Granted literature
- US20230376542A1 IDENTIFYING SIMILAR DOCUMENTS IN A FILE REPOSITORY USING UNIQUE DOCUMENT SIGNATURES Public/Granted day:2023-11-23
Information query