Invention Grant
US08719291B2 Information extraction using spatial reasoning on the CSS2 visual box model 有权
在CSS2视觉盒模型上使用空间推理的信息提取

Information extraction using spatial reasoning on the CSS2 visual box model
Abstract:
A method for extracting tabular information from a web source by determining a plurality of coordinates for a plurality of visualized element nodes on the web source; determining a subset of the plurality of visualized element nodes based on the plurality of coordinates to obtain a candidate web table, wherein each of the subset of the plurality of visualized element nodes constitutes a logical cell of the candidate web table; determining textual content corresponding to the subset of the plurality of visualized element nodes as the textual content would appear after rendering the web source in a browser; and transforming the candidate web table into an explicit representation of relative spatial relation between at least one of the logical cell; and saving the explicit representation in a structured document format.
Information query
Patent Agency Ranking
0/0