Fast identification of text intensive pages from photographs
Abstract:
Determining if a document is a text page includes partitioning the document into a plurality of cells, scaling each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells, using a classifier to examine the snippets to determine which of the cells are classified as text and which of the cells are not classified as text, determining a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each of the cells classified as text, and determining that the document is a text page in response to the total amount exceeding a pre-determined threshold. In response to the total amount being less than the pre-determined threshold, cells not classified as text may be examined further. The classifier may be provided by training a neural net.
Information query
Patent Agency Ranking
0/0