Semantic data type classification in rectangular datasets
Abstract:
Provided is a method, computer program product, and system for automatically predicting unknown semantic data types in a rectangular dataset using a holistic knowledge of said dataset. A processor may receive one or more rectangular datasets, the one or more rectangular datasets comprising a plurality of columns having a set of known semantic data types. The processor may extract a set of features from the plurality of columns, where the set of features is used to determine a relationship among each column of the plurality of columns. The processor may construct a set of training data based on the extracted set of features. Using the training data, the processor may train a machine learning model to predict a semantic data type of a target column in a rectangular dataset having an unknown semantic data type.
Public/Granted literature
Information query
Patent Agency Ranking
0/0