Fast identification of text intensive pages from photographs

Invention Grant

US11715316B2 Fast identification of text intensive pages from photographs 有权

Please log in to see more content

Patent Title: Fast identification of text intensive pages from photographs
Application No.: US17542856

Application Date: 2021-12-06
Publication No.: US11715316B2

Publication Date: 2023-08-01
Inventor: Alexander Pashintsev , Boris Gorbatov , Eugene Livshitz , Vitaly Glazkov
Applicant: EVERNOTE CORPORATION
Applicant Address: US CA Redwood City
Assignee: Evernote Corporation
Current Assignee: Evernote Corporation
Current Assignee Address: US CA San Diego
Agency: Morgan, Lewis & Bockius LLP
The original application number of the division: US15272744 2016.09.22
Main IPC: G06T3/40
IPC: G06T3/40 ; G06T7/60 ; G06V30/413 ; G06V30/414

Fast identification of text intensive pages from photographs

Abstract:

Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the window is stored as a respective text snippet. A plurality of image windows are superimposed over at least one page of an image document. In accordance with a determination that the content of a respective image window meets image criteria, content of the image window is stored as a respective image snippet. The respective text snippet and the respective image snippet are provided to a classifier.

Public/Granted literature

US20220270386A1 FAST IDENTIFICATION OF TEXT INTENSIVE PAGES FROM PHOTOGRAPHS Public/Granted day:2022-08-25

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06T	一般的图像数据处理或产生
G06T3/00	在图像平面内的图形图像转换
G06T3/40	.整个或部分图像的缩放