Invention Grant
- Patent Title: Systems and methods for extracting information from structured documents
- Patent Title (中): 从结构化文档中提取信息的系统和方法
-
Application No.: US10626430Application Date: 2003-07-23
-
Publication No.: US08090678B1Publication Date: 2012-01-03
- Inventor: Oren Glickman , Amir Ashkenazi , Ariel Yaar
- Applicant: Oren Glickman , Amir Ashkenazi , Ariel Yaar
- Applicant Address: US CA Brisbane
- Assignee: Shopping.com
- Current Assignee: Shopping.com
- Current Assignee Address: US CA Brisbane
- Agency: Haynes and Boone, LLP
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
Systems and methods for extracting information from structured documents are provided. The systems and methods relate to selecting a centroid document from a group of structured documents, selecting a subset of the group of structured documents in order to form a cluster of the subset of documents about the centroid document. The selecting the subset is preferably based on the relative similarity between each of the selected subset and the centroid document. Then, systems and methods according to the invention include marking a data element on the centroid document. The systems and elements also include identifying a data element on each of the subset of documents, the data element that corresponds to the marked data element on the centroid document. Finally, data may be extracted from the subset of documents based on the identifying step.
Information query