Invention Grant
- Patent Title: Duplicate code section detection for source code
-
Application No.: US16842903Application Date: 2020-04-08
-
Publication No.: US10970066B1Publication Date: 2021-04-06
- Inventor: Hervé Le Bars , Jerome Jochem , Thomas Baudel
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Cantor Colburn LLP
- Agent Noah Sharkan
- Main IPC: G06F9/00
- IPC: G06F9/00 ; G06F8/75

Abstract:
Techniques for duplicate code section detection for source code are described herein. An aspect includes receiving a plurality of input files corresponding to a software project comprising source code written in a computer programming language. Another aspect includes segmenting each of the plurality of input files into a plurality of statements based on instruction boundaries corresponding to the computer programming language, wherein a respective statement start index is determined for each of the plurality of statements. Another aspect includes populating an enhanced generalized suffix array (eGSA) based on the determined statement start indices, wherein each statement start index corresponds to a respective suffix in a row in the eGSA, and wherein each row comprises a longest common prefix (LCP) field and a preceding statement value corresponding to the row's respective suffix. Another aspect includes identifying duplicate code sections in the plurality of input files based on the eGSA.
Information query