Invention Grant
- Patent Title: Computer-implemented system and method for identifying near duplicate documents
-
Application No.: US14027141Application Date: 2013-09-13
-
Publication No.: US09773039B2Publication Date: 2017-09-26
- Inventor: William C. Knight , Steve Antoch , Sean M. McNee
- Applicant: FTI Consulting, Inc.
- Applicant Address: US MD Annapolis
- Assignee: FTI Consulting, Inc.
- Current Assignee: FTI Consulting, Inc.
- Current Assignee Address: US MD Annapolis
- Agent Patrick J. S. Inouye; Krista A. Wittman
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A computer-implemented system and method for identifying near duplicate documents is provided. A set of documents is obtained and each document is divided into segments. Each of the segments is hashed. A segment identification and sequence order is assigned to each of the hashed segments. The sequence order is based on an order in which the segments occur in one such document. The segments are compared based on the segment identification and those documents with at least two matching segments are identified. The sequence orders of the matching segments are compared and based on the comparison, a determination is made that the identified documents share a relative sequence of the matching segments. The identified documents are designated as near duplicate documents.
Public/Granted literature
- US20140082006A1 Computer-Implemented System And Method For Identifying Near Duplicate Documents Public/Granted day:2014-03-20
Information query