Invention Grant
- Patent Title: Detecting the bounds of borderless tables in fixed-format structured documents using machine learning
-
Application No.: US16419093Application Date: 2019-05-22
-
Publication No.: US11113618B2Publication Date: 2021-09-07
- Inventor: Ram Bhushan Agrawal , Himanshu Mittal
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: Finch & Maloney PLLC
- Main IPC: G06N7/00
- IPC: G06N7/00 ; G06F40/177 ; G06N3/04

Abstract:
Techniques are disclosed for detecting the bounds of borderless open tables in fixed-format structured documents, such as PDF documents, and grouping text lines into predicted borderless tables. The target document comprises a set of text lines each having a respective vertical and horizontal position in the target document. A sorted list of the text lines is generated based upon a vertical and horizontal position of each text line in the target document. For each text line in the sorted list, a respective probability that the text line in the sorted list belongs to a borderless table is then determined. According to one embodiment, the probability may be determined using a classifier that may employ a logistic regression algorithm.
Public/Granted literature
- US20190278837A1 DETECTING THE BOUNDS OF BORDERLESS TABLES IN FIXED-FORMAT STRUCTURED DOCUMENTS USING MACHINE LEARNING Public/Granted day:2019-09-12
Information query
IPC分类:
G | 物理 |
G06 | 计算;推算或计数 |
G06N | 基于特定计算模型的计算机系统 |
G06N7/00 | 基于特定数学模式的计算机系统 |