Finding duplicate passages of text in a collection of text

Invention Grant

US10585975B2 Finding duplicate passages of text in a collection of text 有权

Please log in to see more content

Patent Title: Finding duplicate passages of text in a collection of text
Application No.: US13411234

Application Date: 2012-03-02
Publication No.: US10585975B2

Publication Date: 2020-03-10
Inventor: Julian David Tibble
Applicant: Julian David Tibble
Applicant Address: GB Oxford
Assignee: GITHUB SOFTWARE UK LTD.
Current Assignee: GITHUB SOFTWARE UK LTD.
Current Assignee Address: GB Oxford
Agency: Workman Nydegger
Main IPC: G06F17/22
IPC: G06F17/22

Finding duplicate passages of text in a collection of text

Abstract:

A novel system and computer-implemented method for quickly and efficiently finding and reporting all clones with a large corpus of text. This is achieved by tokenizing the corpus, computing a rolling hash, filtering for hashes that occur more than once, and constructing an equivalence relation over these hashes in which hashes are equated if they are part of the same instance of duplication. The equivalence relation is then used to report all detected clones.

Public/Granted literature

US20130232160A1 FINDING DUPLICATE PASSAGES OF TEXT IN A COLLECTION OF TEXT Public/Granted day:2013-09-05

Information query

Espacenet