Ranking datasets based on data attributes

    公开(公告)号:GB2603609A

    公开(公告)日:2022-08-10

    申请号:GB202117333

    申请日:2021-12-01

    Applicant: IBM

    Abstract: A method to sort and rank datasets. The method identifies target data fields from process documents that indicate data field preferences of a user, 112; identifies target attributes from data use documents that indicate data scope preference of the user, 114; generates metadata sets for associated datasets, 116; determines candidate datasets having a field suitability value that exceeds a certain threshold value that represents a degree of similarity between fields associated with the datasets and the target data fields, 118; assesses metadata for each dataset with regard to target attributes and generates an attribute score for each dataset where the score indicates a likelihood that the dataset will have content exhibiting certain dataset attributes, 120, and, finally, generates a list of (i.e. ranks) candidate datasets according to the attribute score, 122. Data use documents may contain a format such as Business Process Execution Language (BEPL) or Unified Modelling Language (UML). Process documents may be class, activity, sequence or component diagrams. Ranking may be based on historic use of the datasets. In this way, the method may identify relevant data that indicate which datasets would be a good match for various goals of a user (e.g., launch new product).

Patent Agency Ranking