Invention Grant
- Patent Title: Measuring relevance of datasets to a data science model
-
Application No.: US17572668Application Date: 2022-01-11
-
Publication No.: US11893032B2Publication Date: 2024-02-06
- Inventor: Tymoteusz Gedliczka , Szymon Brandys , Piotr Grzywna , Tomasz Kania , Maciej Madej , Krzysztof Pitula
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Edward P. Li
- Main IPC: G06F16/2457
- IPC: G06F16/2457

Abstract:
A computer-implemented method, a computer program product, and a computer system for measuring relevance of datasets to data science models. One or more servers implement steps: extract keywords in each data science model; determine first relative frequencies of the respective keywords in each data science model, for each source group in the data science models; extract keywords in each dataset; determine second relative frequencies of the respective keywords in each dataset, for each source group in the datasets; determine weights of the keywords; calculate first aggregated relevant scores of the respective keywords in each data science model, based on the first relative frequencies and the weights; calculate second aggregated relevant scores of the respective keywords in each dataset, based on the second relative frequencies and the weights. One or more servers calculate similarity between vectors of the first and second aggregated relevant scores, based on a similarity measure between vectors.
Public/Granted literature
- US20230222129A1 MEASURING RELEVANCE OF DATASETS TO A DATA SCIENCE MODEL Public/Granted day:2023-07-13
Information query