Systems and methods for a scalable continuous active learning approach to information classification
Abstract:
Systems and methods for classifying electronic information are provided by way of a Technology-Assisted Review (“TAR”) process. In certain embodiments, the TAR process is a Scalable Continuous Active Learning (“S-CAL”) approach. In certain embodiments, S-CAL selects an initial sample from a document collection, trains a classifier by using a default classification for a portion of the initial sample, scores the initial sample, selects a sub-sample from the initial sample for review, removes the reviewed sub-sample from the initial sample, and repeats the process by re-training the classifier until the initial sample is exhausted. In certain embodiments, a classification threshold is determined using a calculated estimate of the prevalence of relevant information such that the threshold classifies the information in accordance with a determined target criteria. In certain embodiments, the estimate of prevalence is determined from the results of iterations of a TAR process such as S-CAL.
Information query
Patent Agency Ranking
0/0