Sectionizing documents based on visual and language models

Invention Grant

US11321956B1 Sectionizing documents based on visual and language models 有权

Please log in to see more content

Patent Title: Sectionizing documents based on visual and language models
Application No.: US16702394

Application Date: 2019-12-03
Publication No.: US11321956B1

Publication Date: 2022-05-03
Inventor: Kunling Geng
Applicant: Ciitizen, LLC
Applicant Address: US CA San Francisco
Assignee: Ciitizen, LLC
Current Assignee: Ciitizen, LLC
Current Assignee Address: US CA San Francisco
Agency: Sterne, Kessler, Goldstein & Fox P.L.L.C.
Main IPC: G06F17/00
IPC: G06F17/00 ; G06V30/416 ; G06F40/258 ; G06F40/30 ; G06V30/414 ; G06V30/10

Sectionizing documents based on visual and language models

Abstract:

Some embodiments provide a program that receives a request to sectionize a document, uses a visual model to identify a set of candidate section headers in the document, and uses a language model to determine a type of section header for at least one candidate section header in the set of candidate section headers in the document. Some embodiments provide a program that receives a request to anonymize data in a document, uses a visual model to identify a set of candidate confidential sections in the document that are each predicted to include a collection of confidential data, uses a language model to identify terms in each candidate confidential section that are determined to be confidential data, analyzes the document to identify a set of terms in the document based on the identified terms in the set of candidate confidential sections, and redacts the set of terms in the document.

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）