Invention Grant
US08484171B2 Duplicate filtering in a data processing environment 失效
在数据处理环境中重复过滤

Duplicate filtering in a data processing environment
Abstract:
A data processing method is provided. The method comprises collecting a stream of data records received from one or more data sources connected in a communications network; dividing the stream of data records into sets of data records for parallel processing by a plurality of concurrently running tasks, wherein a first task loads a persistent index associated with a first set of data records into memory to generate an in-memory version of the first persistent index for the first set of data records; and identifying duplicate and non-duplicate data records in the first set of data records, based on searching the in-memory version of the first persistent index.
Public/Granted literature
Information query
Patent Agency Ranking
0/0