Invention Grant
- Patent Title: Duplicate filtering in a data processing environment
- Patent Title (中): 在数据处理环境中重复过滤
-
Application No.: US13437017Application Date: 2012-04-02
-
Publication No.: US08484171B2Publication Date: 2013-07-09
- Inventor: Joel Arditi , David Harold Berk , Dagan Gilat , Sergey Krutyolkin , Ariel Landau , Uri Shani
- Applicant: Joel Arditi , David Harold Berk , Dagan Gilat , Sergey Krutyolkin , Ariel Landau , Uri Shani
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Century IP Group
- Agent F. Jason Far-hadian
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A data processing method is provided. The method comprises collecting a stream of data records received from one or more data sources connected in a communications network; dividing the stream of data records into sets of data records for parallel processing by a plurality of concurrently running tasks, wherein a first task loads a persistent index associated with a first set of data records into memory to generate an in-memory version of the first persistent index for the first set of data records; and identifying duplicate and non-duplicate data records in the first set of data records, based on searching the in-memory version of the first persistent index.
Public/Granted literature
- US20120191734A1 DUPLICATE FILTERING IN A DATA PROCESSING ENVIRONMENT Public/Granted day:2012-07-26
Information query