High-speed scanning parser for scalable collection of statistics and use in preparing data for machine learning

Invention Grant

US11556840B2 High-speed scanning parser for scalable collection of statistics and use in preparing data for machine learning 有权

Please log in to see more content

Patent Title: High-speed scanning parser for scalable collection of statistics and use in preparing data for machine learning
Application No.: US16408764

Application Date: 2019-05-10
Publication No.: US11556840B2

Publication Date: 2023-01-17
Inventor: Gwyn Rhys Jones , Nicola Lazzarini , Charikleia Eleftherochorinou , Karolina Katarzyna Dluzniak , Tomass Bernots
Applicant: IQVIA Inc.
Applicant Address: US CT Danbury
Assignee: IQVIA Inc.
Current Assignee: IQVIA Inc.
Current Assignee Address: US CT Danbury
Agency: Stevens & Lee
Agent John Maldjian
Main IPC: G06N20/00
IPC: G06N20/00 ; G06F12/02 ; G06F9/54

High-speed scanning parser for scalable collection of statistics and use in preparing data for machine learning

Abstract:

A parser is deployed early in a machine learning pipeline to read raw data and collect useful statistics about the raw data's content to determine which items of raw data exhibit a proxy for feature importance for the machine learning model. The parser operates at high speeds that approach the disk's absolute throughput while utilizing a small memory footprint. Utilization of the parser enables the machine learning pipeline to receive a fraction of the total raw data that would otherwise be available. Several scans through the data are performed, by which proxies for feature importance are indicated and irrelevant features may be discarded and thereby not forwarded to the machine learning pipeline. This reduces the amount of memory and other hardware resources used at the server and also expedites the machine learning process.

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N20/00	机器学习