Invention Grant
- Patent Title: Outlier detection in textual data
-
Application No.: US17362783Application Date: 2021-06-29
-
Publication No.: US11768859B2Publication Date: 2023-09-26
- Inventor: Viliam Holub , Eoin Shanley , Trevor Parsons
- Applicant: Rapid7, Inc.
- Applicant Address: US MA Boston
- Assignee: Rapid7, Inc.
- Current Assignee: Rapid7, Inc.
- Current Assignee Address: US MA Boston
- Agent Ashwin Anand; Lei Sun
- Main IPC: G06F16/28
- IPC: G06F16/28 ; G06F16/23 ; G06F16/2457 ; G06F16/22 ; G06F16/25

Abstract:
Systems and methods are disclosed to implement an outlier detection system for text records. In embodiments, the detection system generates a fingerprint for each incoming record so that similar records map to similar fingerprints. Each record is assigned to a closest cluster in a set of clusters based computed distances between on the record's fingerprint and respective cluster fingerprints of the clusters. The cluster fingerprint is dynamically updated to maintain respective a representative fingerprint of its member records. When a new record is received that is not sufficiently close to any cluster, a new cluster is added to the set for the new record. In embodiments, the creation of the new cluster triggers an alert that the new record is a potential outlier. Advantageously, the disclosed detection system can be used to detect outliers in records in near real time, without the need to pre-specify outlier characteristics.
Public/Granted literature
- US20210326364A1 DETECTION OF OUTLIERS IN TEXT RECORDS Public/Granted day:2021-10-21
Information query