Abstract:
An improved sliding window dictionary-based compression method limits the data within the sliding window searched to data strings occurring at each discrete match location within a plurality of predefined discrete match locations, the plurality of predefined discrete match locations comprising a set of non-continuous data positions within the window of data.
Abstract:
The present invention relates to a methodology for assembling a document from content spanning multiple web-pages. Given a starting location, one process analyzes a single page at a time to find candidate links. The links are recursively followed and those pages are analyzed. A detailed set of heuristics is used to determine what is or is not a candidate link. The candidate pages are then optionally fed to a document-level analyzer. This process compares the attributes of one page against the others and looks for a document-like structure. Using another detailed set of heuristics, the document-level analyzer determines if the page should be included in the document.