Fast identification of text intensive pages from photographs

Invention Grant

US10372981B1 Fast identification of text intensive pages from photographs 有权

Please log in to see more content

Patent Title: Fast identification of text intensive pages from photographs
Application No.: US15272744

Application Date: 2016-09-22
Publication No.: US10372981B1

Publication Date: 2019-08-06
Inventor: Alexander Pashintsev , Boris Gorbatov , Eugene Livshitz , Vitaly Glazkov
Applicant: Evernote Corporation
Applicant Address: US CA Redwood City
Assignee: EVERNOTE CORPORATION
Current Assignee: EVERNOTE CORPORATION
Current Assignee Address: US CA Redwood City
Agency: Morgan, Lewis & Bockius LLP
Main IPC: G06K9/00
IPC: G06K9/00 ; G06K9/52 ; G06T7/60 ; G06T3/40

Fast identification of text intensive pages from photographs

Abstract:

Determining if a document is a text page includes partitioning the document into a plurality of cells, scaling each of the cells to a standardized number of pixels to provide a corresponding snippet for each of the cells, using a classifier to examine the snippets to determine which of the cells are classified as text and which of the cells are not classified as text, determining a volume of text for the document based on a total amount of text in the document corresponding to a sum of an amount of text in each of the cells classified as text, and determining that the document is a text page in response to the total amount exceeding a pre-determined threshold. In response to the total amount being less than the pre-determined threshold, cells not classified as text may be examined further. The classifier may be provided by training a neural net.

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )