Invention Grant
US09244950B2 Method for synthetic data generation for query workloads 有权
用于查询工作负载的合成数据生成方法

Method for synthetic data generation for query workloads
Abstract:
Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as: p ⁡ ( x ) = exp ( ∑ v ⁢ ⁢ w v ⁢ f v ⁡ ( x ) Z ) for each node ν, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.
Public/Granted literature
Information query
Patent Agency Ranking
0/0