Invention Grant
- Patent Title: Finding duplicate passages of text in a collection of text
-
Application No.: US13411234Application Date: 2012-03-02
-
Publication No.: US10585975B2Publication Date: 2020-03-10
- Inventor: Julian David Tibble
- Applicant: Julian David Tibble
- Applicant Address: GB Oxford
- Assignee: GITHUB SOFTWARE UK LTD.
- Current Assignee: GITHUB SOFTWARE UK LTD.
- Current Assignee Address: GB Oxford
- Agency: Workman Nydegger
- Main IPC: G06F17/22
- IPC: G06F17/22

Abstract:
A novel system and computer-implemented method for quickly and efficiently finding and reporting all clones with a large corpus of text. This is achieved by tokenizing the corpus, computing a rolling hash, filtering for hashes that occur more than once, and constructing an equivalence relation over these hashes in which hashes are equated if they are part of the same instance of duplication. The equivalence relation is then used to report all detected clones.
Public/Granted literature
- US20130232160A1 FINDING DUPLICATE PASSAGES OF TEXT IN A COLLECTION OF TEXT Public/Granted day:2013-09-05
Information query