Dataset relevance estimation in storage systems

Invention Grant

US10592147B2 Dataset relevance estimation in storage systems 有权

Please log in to see more content

Patent Title: Dataset relevance estimation in storage systems
Application No.: US15660434

Application Date: 2017-07-26
Publication No.: US10592147B2

Publication Date: 2020-03-17
Inventor: Giovanni Cherubini , Mark A. Lantz , Taras Lehinevych , Vinodh Venkatesan
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agent Anthony M. Pallone
Main IPC: G06F16/30
IPC: G06F16/30 ; G06F3/06

Dataset relevance estimation in storage systems

Abstract:

The invention is notably directed to computer-implemented methods and systems for managing datasets in a storage system. In such systems, it is assumed that a (typically small) subset of datasets are labeled with respect to their relevance, so as to be associated with respective relevance values. Essentially, the present methods determine, for each unlabeled dataset of the datasets, a respective probability distribution over a set of relevance values. From this probability distribution, a corresponding relevance value can be obtained. This probability distribution is computed based on distances (or similarities), in terms of metadata values, between said each unlabeled dataset and the labeled datasets. Based on their associated relevance values, datasets can then be efficiently managed in a storage system.

Public/Granted literature

US20190034083A1 DATASET RELEVANCE ESTIMATION IN STORAGE SYSTEMS Public/Granted day:2019-01-31

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）