Probabilistic language models for identifying sequential reading order of discontinuous text segments

Invention Grant

US11769111B2 Probabilistic language models for identifying sequential reading order of discontinuous text segments 有权

Please log in to see more content

Patent Title: Probabilistic language models for identifying sequential reading order of discontinuous text segments
Application No.: US16904881

Application Date: 2020-06-18
Publication No.: US11769111B2

Publication Date: 2023-09-26
Inventor: Trung Huu Bui , Hung Hai Bui , Shawn Alan Gaither , Walter Wei-Tuh Chang , Michael Frank Kraley , Pranjal Daga
Applicant: ADOBE INC.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Agency: Shook, Hardy & Bacon L.L.P.
Main IPC: G06F17/00
IPC: G06F17/00 ; G06Q10/10 ; G06Q10/06 ; G06F40/10 ; G06V30/148 ; G06V30/413 ; G06F40/103

Probabilistic language models for identifying sequential reading order of discontinuous text segments

Abstract:

The present invention is directed towards providing automated workflows for the identification of a reading order from text segments extracted from a document. Ordering the text segments is based on trained natural language models. In some embodiments, the workflows are enabled to perform a method for identifying a sequence associated with a portable document. The methods includes iteratively generating a probabilistic language model, receiving the portable document, and selectively extracting features (such as but not limited to text segments) from the document. The method may generate pairs of features (or feature pair from the extracted features). The method may further generate a score for each of the pairs based on the probabilistic language model and determine an order to features based on the scores. The method may provide the extracted features in the determined order.

Public/Granted literature

US20200320329A1 PROBABILISTIC LANGUAGE MODELS FOR IDENTIFYING SEQUENTIAL READING ORDER OF DISCONTINUOUS TEXT SEGMENTS Public/Granted day:2020-10-08

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）