Automatic extraction of a training corpus for a data classifier based on machine learning algorithms

Invention Grant

US11409779B2 Automatic extraction of a training corpus for a data classifier based on machine learning algorithms 有权

Please log in to see more content

Patent Title: Automatic extraction of a training corpus for a data classifier based on machine learning algorithms
Application No.: US15977665

Application Date: 2018-05-11
Publication No.: US11409779B2

Publication Date: 2022-08-09
Inventor: Fang Hou , Yikai Wu , Xiaopei Cheng , Sifei Ding
Applicant: Accenture Global Solutions Limited
Applicant Address: IE Dublin
Assignee: Accenture Global Solutions Limited
Current Assignee: Accenture Global Solutions Limited
Current Assignee Address: IE Dublin
Agency: Crowell & Moring LLP
Main IPC: G06F16/35
IPC: G06F16/35 ; G06K9/62 ; G06N20/00 ; G06F16/36 ; G06N20/10 ; G06N20/20 ; G06V30/40 ; G06V30/148 ; G06V30/242 ; G06N5/00 ; G06N5/04

Automatic extraction of a training corpus for a data classifier based on machine learning algorithms

Abstract:

An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.

Public/Granted literature

US20180365322A1 AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS Public/Granted day:2018-12-20

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/35	..••聚类；分类