METHOD FOR GENERATING SYNTHETIC DATA SETS AT SCALE WITH NON-REDUNDANT PARTITIONING

    公开(公告)号:US20180107729A1

    公开(公告)日:2018-04-19

    申请号:US15294142

    申请日:2016-10-14

    Applicant: Red Hat, Inc.

    CPC classification number: G06F16/285 G06N20/00

    Abstract: An example system includes a first machine and a second machine, a clustering module, and a training module. The clustering module receives a plurality of data sets, each including attributes. The clustering module partitions the plurality of data sets into a first clustered data set and a second clustered data set. Each data set of the plurality of data sets is partitioned. The training module assigns a first stochastic model to the first clustered data set and a second stochastic model to the second clustered data set. The first machine selects the first clustered data set and the first stochastic model and generates a first synthetic data set having generated data for each one of the attributes. The second machine selects the second clustered data set and the second stochastic model and generates a second synthetic data set having generated data for each one of the attributes.

    Method for generating synthetic data sets at scale with non-redundant partitioning

    公开(公告)号:US10891311B2

    公开(公告)日:2021-01-12

    申请号:US15294142

    申请日:2016-10-14

    Applicant: Red Hat, Inc.

    Abstract: An example system includes a first machine and a second machine, a clustering module, and a training module. The clustering module receives a plurality of data sets, each including attributes. The clustering module partitions the plurality of data sets into a first clustered data set and a second clustered data set. Each data set of the plurality of data sets is partitioned. The training module assigns a first stochastic model to the first clustered data set and a second stochastic model to the second clustered data set. The first machine selects the first clustered data set and the first stochastic model and generates a first synthetic data set having generated data for each one of the attributes. The second machine selects the second clustered data set and the second stochastic model and generates a second synthetic data set having generated data for each one of the attributes.

Patent Agency Ranking