Invention Grant
- Patent Title: High-speed scanning parser for scalable collection of statistics and use in preparing data for machine learning
-
Application No.: US16408764Application Date: 2019-05-10
-
Publication No.: US11556840B2Publication Date: 2023-01-17
- Inventor: Gwyn Rhys Jones , Nicola Lazzarini , Charikleia Eleftherochorinou , Karolina Katarzyna Dluzniak , Tomass Bernots
- Applicant: IQVIA Inc.
- Applicant Address: US CT Danbury
- Assignee: IQVIA Inc.
- Current Assignee: IQVIA Inc.
- Current Assignee Address: US CT Danbury
- Agency: Stevens & Lee
- Agent John Maldjian
- Main IPC: G06N20/00
- IPC: G06N20/00 ; G06F12/02 ; G06F9/54

Abstract:
A parser is deployed early in a machine learning pipeline to read raw data and collect useful statistics about the raw data's content to determine which items of raw data exhibit a proxy for feature importance for the machine learning model. The parser operates at high speeds that approach the disk's absolute throughput while utilizing a small memory footprint. Utilization of the parser enables the machine learning pipeline to receive a fraction of the total raw data that would otherwise be available. Several scans through the data are performed, by which proxies for feature importance are indicated and irrelevant features may be discarded and thereby not forwarded to the machine learning pipeline. This reduces the amount of memory and other hardware resources used at the server and also expedites the machine learning process.
Information query