Invention Grant
- Patent Title: System and method for generating a synthetic dataset from an original dataset
-
Application No.: US17407181Application Date: 2021-08-19
-
Publication No.: US11640446B2Publication Date: 2023-05-02
- Inventor: Mandis Beigi , Jacob Aptekar , Afrah Shafquat , Jason Mezey
- Applicant: Medidata Solutions, Inc.
- Applicant Address: US NY New York
- Assignee: Medidata Solutions, Inc.
- Current Assignee: Medidata Solutions, Inc.
- Current Assignee Address: US NY New York
- Agency: Steptoe & Johnson LLP
- Agent Carl B. Wischhusen
- Main IPC: G06F21/62
- IPC: G06F21/62 ; G06F16/27 ; G06F18/214 ; G06F18/2133 ; G06F18/2135 ; G06F18/21 ; G06F18/2137

Abstract:
A method for generating a synthetic dataset from an original dataset includes encoding categorical features of the original dataset, embedding the encoded dataset in a low-dimensional space, selecting a seed record from the embedded dataset, identifying a plurality of nearest neighbor records to the seed record, generating a new record by randomly selecting features from the plurality of nearest neighbor records, and concatenating the new record into the synthetic dataset. For a synthetic dataset that contains N records, which may be the same as or different from the number of records in the original dataset, the selecting, identifying, generating, and concatenating operations operate a total of N times on the records in the embedded dataset.
Public/Granted literature
- US20230060848A1 SYSTEM AND METHOD FOR GENERATING A SYNTHETIC DATASET FROM AN ORIGINAL DATASET Public/Granted day:2023-03-02
Information query