Invention Grant
- Patent Title: Method and system for clustering identified forms
- Patent Title (中): 聚类识别形式的方法和系统
-
Application No.: US12032331Application Date: 2008-02-15
-
Publication No.: US07996390B2Publication Date: 2011-08-09
- Inventor: Juliana Freire , Luciano Barbosa
- Applicant: Juliana Freire , Luciano Barbosa
- Applicant Address: US UT Salt Lake City
- Assignee: The University of Utah Research Foundation
- Current Assignee: The University of Utah Research Foundation
- Current Assignee Address: US UT Salt Lake City
- Agency: Bell & Manning, LLC
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A method is provided for organizing a plurality of documents that include forms. An initial set of clusters is defined for the plurality of documents. The initial set of clusters is reclustered based on similarity values calculated in multiple feature spaces. For example, a first feature space may be associated with a content of a document while a second feature space may be associated with a content of a form associated with the document. Each cluster has an associated centroid vector in each feature space that is used to represent the cluster. The similarity between the document and each cluster is calculated in both feature spaces. Each document is assigned to the cluster whose centroid is most similar. The cluster centroids may be recalculated and the process repeated until the cluster assignments become stable.
Public/Granted literature
- US20090210406A1 METHOD AND SYSTEM FOR CLUSTERING IDENTIFIED FORMS Public/Granted day:2009-08-20
Information query