Invention Grant
US08719291B2 Information extraction using spatial reasoning on the CSS2 visual box model
有权
在CSS2视觉盒模型上使用空间推理的信息提取
- Patent Title: Information extraction using spatial reasoning on the CSS2 visual box model
- Patent Title (中): 在CSS2视觉盒模型上使用空间推理的信息提取
-
Application No.: US12108879Application Date: 2008-04-24
-
Publication No.: US08719291B2Publication Date: 2014-05-06
- Inventor: Wolfgang Gatterbauer , Bernhard Kruepl , Paul Bohunsky , Marcus Herzog
- Applicant: Wolfgang Gatterbauer , Bernhard Kruepl , Paul Bohunsky , Marcus Herzog
- Applicant Address: AT Vienna
- Assignee: Lixto Software GmbH
- Current Assignee: Lixto Software GmbH
- Current Assignee Address: AT Vienna
- Agency: Sughrue Mion, PLLC
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A method for extracting tabular information from a web source by determining a plurality of coordinates for a plurality of visualized element nodes on the web source; determining a subset of the plurality of visualized element nodes based on the plurality of coordinates to obtain a candidate web table, wherein each of the subset of the plurality of visualized element nodes constitutes a logical cell of the candidate web table; determining textual content corresponding to the subset of the plurality of visualized element nodes as the textual content would appear after rendering the web source in a browser; and transforming the candidate web table into an explicit representation of relative spatial relation between at least one of the logical cell; and saving the explicit representation in a structured document format.
Public/Granted literature
- US20080294679A1 INFORMATION EXTRACTION USING SPATIAL REASONING ON THE CSS2 VISUAL BOX MODEL Public/Granted day:2008-11-27
Information query