Methods and apparatus for transforming and statistically modeling relational databases to synthesize privacy-protected anonymized data
Abstract:
A relational database is transformed so as to obfuscate secure and/or private aspects of data contained in the database, while preserving salient elements of the data to facilitate data analysis. A restructured database is generatively modeled, and the model is sampled to create synthetic data that maintains sufficiently similar (or the same) mathematical properties and relations as the original data stored in the database. In one example, various statistics at the intersection of related database tables are determined by modeling data using an iterative multivariate approach. Synthetic data may be sampled from any part of the modeled database, wherein the synthesized data is “realistic” in that it statistically mimics the original data in the database. The generation of such synthetic data allows publication of bulk data freely and on-demand (e.g., for data analysis purposes), without the risk of security/privacy breaches.
Information query
Patent Agency Ranking
0/0