Invention Grant
- Patent Title: Automatic extraction of a training corpus for a data classifier based on machine learning algorithms
-
Application No.: US15977665Application Date: 2018-05-11
-
Publication No.: US11409779B2Publication Date: 2022-08-09
- Inventor: Fang Hou , Yikai Wu , Xiaopei Cheng , Sifei Ding
- Applicant: Accenture Global Solutions Limited
- Applicant Address: IE Dublin
- Assignee: Accenture Global Solutions Limited
- Current Assignee: Accenture Global Solutions Limited
- Current Assignee Address: IE Dublin
- Agency: Crowell & Moring LLP
- Main IPC: G06F16/35
- IPC: G06F16/35 ; G06K9/62 ; G06N20/00 ; G06F16/36 ; G06N20/10 ; G06N20/20 ; G06V30/40 ; G06V30/148 ; G06V30/242 ; G06N5/00 ; G06N5/04

Abstract:
An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.
Public/Granted literature
- US20180365322A1 AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS Public/Granted day:2018-12-20
Information query