Extracting information from tables embedded within documents

Invention Grant

US10706218B2 Extracting information from tables embedded within documents 有权

Please log in to see more content

Patent Title: Extracting information from tables embedded within documents
Application No.: US15594762

Application Date: 2017-05-15
Publication No.: US10706218B2

Publication Date: 2020-07-07
Inventor: David Richard Milward , Himanshu Agrawal , James Robert Walton Cormack , Francisco Nuno Quintiliano Mendonca Carapeto Costa
Applicant: Linguamatics Ltd.
Applicant Address: GB Cambridge
Assignee: Linguamatics Ltd.
Current Assignee: Linguamatics Ltd.
Current Assignee Address: GB Cambridge
Agency: Maldjian Law Group LLC
Agent John Maldjian
Main IPC: G06F16/00
IPC: G06F16/00 ; G06F40/14 ; G06F16/84 ; G06F40/18 ; G06F40/154 ; G06F40/166 ; G06F40/177

Extracting information from tables embedded within documents

Abstract:

Much valuable information in documents is presented within tables. However, the information within tables is hard to extract automatically with high accuracy due to the wide variety and low quality of typical tables found in electronic documents. Information extraction technology can provide a method of extracting information from heterogeneous tables by recognizing tables, the header cells, and cells that are merged or should be merged, creating a richer representation of table structure and providing a convenient way of linking cells to their row and column headers. Use of this richer representation allows a few extraction patterns to successfully pull out information from a wide variety of differently formatted tables.

Public/Granted literature

US20170329749A1 EXTRACTING INFORMATION FROM TABLES EMBEDDED WITHIN DOCUMENTS Public/Granted day:2017-11-16

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构