Invention Grant
- Patent Title: Method for synthetic data generation for query workloads
- Patent Title (中): 用于查询工作负载的合成数据生成方法
-
Application No.: US13934232Application Date: 2013-07-03
-
Publication No.: US09244950B2Publication Date: 2016-01-26
- Inventor: Atreyee Dey , Prasan Roy
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: North Shore Patents, P.C.
- Agent Michele Liu Baillie; Lesley Leonessa
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F11/36

Abstract:
Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ( x ) = exp ( ∑ v w v f v ( x ) Z ) for each node ν, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.
Public/Granted literature
- US20150012522A1 METHOD FOR SYNTHETIC DATA GENERATION FOR QUERY WORKLOADS Public/Granted day:2015-01-08
Information query