Invention Grant
- Patent Title: System for automated data engineering for large scale machine learning
-
Application No.: US17009883Application Date: 2020-09-02
-
Publication No.: US11301438B2Publication Date: 2022-04-12
- Inventor: Wei Dai , Weiren Yu , Eric Xing
- Applicant: Petuum Inc.
- Applicant Address: US PA Pittsburgh
- Assignee: Petuum Inc.
- Current Assignee: Petuum Inc.
- Current Assignee Address: US PA Pittsburgh
- Agency: MagStone Law, LLP
- Agent Enshan Hong
- Main IPC: G06F16/21
- IPC: G06F16/21 ; G06N20/00 ; G06F16/27 ; G06F15/76

Abstract:
Accordingly, a data engineering system for machine learning at scale is disclosed. In one embodiment, the data engineering system includes an ingest processing module having a schema update submodule and a feature statistics update submodule, wherein the schema update submodule is configured to discover new features and add them to a schema, and wherein the feature statistics update submodule collects statistics for each feature to be used in an online transformation, a record store to store data from a data source, and a transformation module, to receive a low dimensional data instance from the record store and to receive the schema and feature statistics from the ingest processing module, and to transform the low dimensional data instance into a high dimensional representation. One embodiment provides a method for data engineering for machine learning at scale, the method including calling a built-in feature transformation or defining a new transformation, specifying a data source and compressing and storing the data, providing ingest-time processing by automatically analyzing necessary statistics for features, and then generating a schema for a dataset for subsequent data engineering. Other embodiments are disclosed herein.
Public/Granted literature
- US20210026818A1 System for Automated Data Engineering for Large Scale Machine Learning Public/Granted day:2021-01-28
Information query