Partitioning of data mining training set

Invention Grant

US07756881B2 Partitioning of data mining training set 有权

Title translation: 数据挖掘训练集分区

Please log in to see more content

Patent Title: Partitioning of data mining training set
Patent Title (中): 数据挖掘训练集分区
Application No.: US11371477

Application Date: 2006-03-09
Publication No.: US07756881B2

Publication Date: 2010-07-13
Inventor: Ioan Bogdan Crivat , Raman S. Iyer , C. James MacLennan
Applicant: Ioan Bogdan Crivat , Raman S. Iyer , C. James MacLennan
Applicant Address: US WA Redmond
Assignee: Microsoft Corporation
Current Assignee: Microsoft Corporation
Current Assignee Address: US WA Redmond
Agency: Workman Nydegger
Main IPC: G06F7/00
IPC: G06F7/00 ; G06F17/30

Partitioning of data mining training set

Abstract:

A system that effectuates fetching a complete set of relational data into a mining services server and subsequently defining desired partitions upon the fetched data is provided. In accordance with the innovation, the data can be locally cached and partitioned therefrom. Accordingly, upon the same mining structure (e.g., cache) that has been partitioned, the novel innovation can build mining models for each partition. In other words, the innovation can employ the concept of mining structure as a data cache while manipulating only partitions of this cache in certain operations. The innovation can be employed in scenarios where a user wants to train a mining model using only data points that satisfy a particular Boolean condition, a user wants to split the training set into multiple partitions (e.g., training/testing) and/or a user wants to perform a data mining procedure known as “N-fold cross validation.”

Abstract(Chinese):

提供了一种能够将完整的关系数据集提取到采矿服务服务器中并随后在获取的数据上定义所需分区的系统。根据创新，数据可以被本地缓存并从中分割。因此，在已经被划分的相同挖掘结构（例如，高速缓存）上，新颖的创新可以为每个分区建立挖掘模型。换句话说，创新可以采用挖掘结构的概念作为数据高速缓存，同时在某些操作中仅操纵该高速缓存的分区。该创新可以在用户想要仅使用满足特定布尔条件的数据点来训练挖掘模型的情况下使用，用户希望将训练集合分成多个分区（例如，训练/测试）和/或用户想要执行称为“N-fold交叉验证”的数据挖掘过程。

Public/Granted literature

US20070214135A1 Partitioning of data mining training set Public/Granted day:2007-09-13

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）