Invention Grant
- Patent Title: Fast identification of text intensive pages from photographs
-
Application No.: US15272744Application Date: 2016-09-22
-
Publication No.: US10372981B1Publication Date: 2019-08-06
- Inventor: Alexander Pashintsev , Boris Gorbatov , Eugene Livshitz , Vitaly Glazkov
- Applicant: Evernote Corporation
- Applicant Address: US CA Redwood City
- Assignee: EVERNOTE CORPORATION
- Current Assignee: EVERNOTE CORPORATION
- Current Assignee Address: US CA Redwood City
- Agency: Morgan, Lewis & Bockius LLP
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06K9/52 ; G06T7/60 ; G06T3/40

Abstract:
Determining if a document is a text page includes partitioning the document into a plurality of cells, scaling each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells, using a classifier to examine the snippets to determine which of the cells are classified as text and which of the cells are not classified as text, determining a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each of the cells classified as text, and determining that the document is a text page in response to the total amount exceeding a pre-determined threshold. In response to the total amount being less than the pre-determined threshold, cells not classified as text may be examined further. The classifier may be provided by training a neural net.
Information query