System for determining reliability of extracted data using localized graph analysis
Abstract:
A webpage containing information to be extracted may undergo changes to a layout of elements that present the information. These changes could result in an inability to retrieve the information later. A first graph is determined that represents elements of a first version of a webpage at a first time. An element in the first graph for which information is being acquired is specified. A relevant portion of the first graph is designated that includes the element and immediate neighbors in the first graph. Later, a second version of the webpage is retrieved, and a second graph of that second version is determined. The relevant portion of the first graph is compared to the second graph. If a match is found, the information of interest is extracted from the specified element of the second graph. This allows extraction of information to proceed even if the layout of elements changes.
Information query
Patent Agency Ranking
0/0