Invention Grant
- Patent Title: Sectionizing documents based on visual and language models
-
Application No.: US16702394Application Date: 2019-12-03
-
Publication No.: US11321956B1Publication Date: 2022-05-03
- Inventor: Kunling Geng
- Applicant: Ciitizen, LLC
- Applicant Address: US CA San Francisco
- Assignee: Ciitizen, LLC
- Current Assignee: Ciitizen, LLC
- Current Assignee Address: US CA San Francisco
- Agency: Sterne, Kessler, Goldstein & Fox P.L.L.C.
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06V30/416 ; G06F40/258 ; G06F40/30 ; G06V30/414 ; G06V30/10

Abstract:
Some embodiments provide a program that receives a request to sectionize a document, uses a visual model to identify a set of candidate section headers in the document, and uses a language model to determine a type of section header for at least one candidate section header in the set of candidate section headers in the document. Some embodiments provide a program that receives a request to anonymize data in a document, uses a visual model to identify a set of candidate confidential sections in the document that are each predicted to include a collection of confidential data, uses a language model to identify terms in each candidate confidential section that are determined to be confidential data, analyzes the document to identify a set of terms in the document based on the identified terms in the set of candidate confidential sections, and redacts the set of terms in the document.
Information query