Invention Grant
- Patent Title: Systems and methods for a scalable continuous active learning approach to information classification
-
Application No.: US15186387Application Date: 2016-06-17
-
Publication No.: US10671675B2Publication Date: 2020-06-02
- Inventor: Gordon V. Cormack , Maura R. Grossman
- Applicant: Gordon V. Cormack , Maura R. Grossman
- Agency: Bryan Cave Leighton Paisner LLP
- Main IPC: G06F16/93
- IPC: G06F16/93 ; G06N20/00 ; G06F16/23 ; G06F16/35

Abstract:
Systems and methods for classifying electronic information are provided by way of a Technology-Assisted Review (“TAR”) process. In certain embodiments, the TAR process is a Scalable Continuous Active Learning (“S-CAL”) approach. In certain embodiments, S-CAL selects an initial sample from a document collection, trains a classifier by using a default classification for a portion of the initial sample, scores the initial sample, selects a sub-sample from the initial sample for review, removes the reviewed sub-sample from the initial sample, and repeats the process by re-training the classifier until the initial sample is exhausted. In certain embodiments, a classification threshold is determined using a calculated estimate of the prevalence of relevant information such that the threshold classifies the information in accordance with a determined target criteria. In certain embodiments, the estimate of prevalence is determined from the results of iterations of a TAR process such as S-CAL.
Public/Granted literature
- US20160371262A1 SYSTEMS AND METHODS FOR A SCALABLE CONTINUOUS ACTIVE LEARNING APPROACH TO INFORMATION CLASSIFICATION Public/Granted day:2016-12-22
Information query