Invention Grant
- Patent Title: Extracting data
- Patent Title (中): 提取数据
-
Application No.: US12900133Application Date: 2010-10-07
-
Publication No.: US08239349B2Publication Date: 2012-08-07
- Inventor: Maria G. Castellanos , Miguel Durazo , Umeshwar Dayal
- Applicant: Maria G. Castellanos , Miguel Durazo , Umeshwar Dayal
- Applicant Address: US TX Houston
- Assignee: Hewlett-Packard Development Company, L.P.
- Current Assignee: Hewlett-Packard Development Company, L.P.
- Current Assignee Address: US TX Houston
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Information can be extracted from unstructured documents using embodiments described herein. An entity recognition may be performed on an unstructured document and found entities may be annotated. Annotating includes inserting tags around the found entities to generate marked entities. A rule is applied to each of the marked entities in the unstructured document to generate a confidence value for every marked entity, wherein the rule comprises a plurality of prefixes for a target entity and a plurality of suffixes for the target entity. A marked entity with the highest confidence value is selected as an extraction target.
Public/Granted literature
- US20120089620A1 EXTRACTING DATA Public/Granted day:2012-04-12
Information query