-
公开(公告)号:US11769072B2
公开(公告)日:2023-09-26
申请号:US15231294
申请日:2016-08-08
Applicant: Adobe Inc.
Inventor: Michael Kraley
IPC: G06N20/00 , G06N5/02 , G06V30/414 , G06F18/2413
CPC classification number: G06N20/00 , G06F18/24133 , G06N5/02 , G06V30/414
Abstract: The structure of an untagged document can be derived using a predictive model that is trained in a supervised learning framework based on a corpus of tagged training documents. Analyzing the training documents results in a plurality of document part feature vectors, each of which correlates a category defining a document part (for example, “title” or “body paragraph”) with one or more feature-value pairs (for example, “font=Arial” or “alignment=centered”). Any suitable machine learning algorithm can be used to train the predictive model based on the document part feature vectors extracted from the training documents. Once the predictive model has been trained, it can receive feature-value pairs corresponding to a portion of an untagged document and make predictions with respect to the how that document part should be categorized. The predictive model can therefore generate tag metadata that defines a structure of the untagged document in an automated fashion.
-
公开(公告)号:US11689507B2
公开(公告)日:2023-06-27
申请号:US16695636
申请日:2019-11-26
Applicant: Adobe Inc.
Inventor: Nikolaos Barmpalios , Ruchi Rajiv Deshpande , Randy Lee Swineford , Nargol Rezvani , Andrew Marc Greene , Shawn Alan Gaither , Michael Kraley
IPC: H04L9/40 , G06Q30/0202 , G06N5/04 , G06N20/00
CPC classification number: H04L63/04 , G06N5/04 , G06N20/00 , G06Q30/0202
Abstract: Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.
-
公开(公告)号:US20230336532A1
公开(公告)日:2023-10-19
申请号:US18317338
申请日:2023-05-15
Applicant: Adobe Inc.
Inventor: Nikolaos Barmpalios , Ruchi Rajiv Deshpande , Randy Lee Swineford , Nargol Rezvani , Andrew Marc Greene , Shawn Alan Gaither , Michael Kraley
IPC: H04L9/40 , G06Q30/0202 , G06N5/04 , G06N20/00
CPC classification number: H04L63/04 , G06Q30/0202 , G06N5/04 , G06N20/00
Abstract: Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.
-
公开(公告)号:US11238312B2
公开(公告)日:2022-02-01
申请号:US16690695
申请日:2019-11-21
Applicant: Adobe Inc.
Inventor: Verena Kaynig-Fittkau , Sruthi Madapoosi Ravi , Richard Cohn , Nikolaos Barmpalios , Michael Kraley , Kanchana Sethu
Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for generating diverse and realistic synthetic documents using deep learning. In particular, the disclosed systems can utilize a trained neural network to generate realistic image layouts comprising page elements that comply with layout parameters. The disclosed systems can also generate synthetic content corresponding to the page elements within the image layouts. The disclosed systems insert the synthetic content into the corresponding page elements of documents based on the image layouts to generate synthetic documents.
-
公开(公告)号:US20210160221A1
公开(公告)日:2021-05-27
申请号:US16695636
申请日:2019-11-26
Applicant: Adobe Inc.
Inventor: Nikolaos Barmpalios , Ruchi Rajiv Deshpande , Randy Lee Swineford , Nargol Rezvani , Andrew Marc Greene , Shawn Alan Gaither , Michael Kraley
Abstract: Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.
-
公开(公告)号:US10372821B2
公开(公告)日:2019-08-06
申请号:US15462684
申请日:2017-03-17
Applicant: Adobe Inc.
Inventor: Walter Chang , Trung Bui , Pranjal Daga , Michael Kraley , Hung Bui
Abstract: Certain embodiments identify a correct structured reading-order sequence of text segments extracted from a file. A probabilistic language model is generated from a large text corpus to comprise observed word sequence patterns for a given language. The language model measures whether splicing together a first text segment with another continuation text segment results in a phrase that is more likely than a phrase resulting from splicing together the first text segment with other continuation text segments. Sets of text segments, which include a first set with a first text segment and a first continuation text segment as well as a second set with the first text segment and a second continuation text segment, are provided to the probabilistic model. A score indicative of a likelihood of the set providing a correct structured reading-order sequence is obtained for each set of text segments.
-
公开(公告)号:US12267305B2
公开(公告)日:2025-04-01
申请号:US18317338
申请日:2023-05-15
Applicant: Adobe Inc.
Inventor: Nikolaos Barmpalios , Ruchi Rajiv Deshpande , Randy Lee Swineford , Nargol Rezvani , Andrew Marc Greene , Shawn Alan Gaither , Michael Kraley
IPC: H04L9/40 , G06N5/04 , G06N20/00 , G06Q30/0202
Abstract: Systems and techniques for privacy preserving document analysis are described that derive insights pertaining to a digital document without communication of the content of the digital document. To do so, the privacy preserving document analysis techniques described herein capture visual or contextual features of the digital document and creates a stamp representation that represents these features without included the content of the digital document. The stamp representation is projected into a stamp embedding space based on a stamp encoding model generated through machine learning techniques capturing feature patterns and interaction in the stamp representations. The stamp encoding model exploits these feature interactions to define similarity of source documents based on location within the stamp embedding space. Accordingly, the techniques described herein can determine a similarity of documents without having access to the documents themselves.
-
公开(公告)号:US20210158093A1
公开(公告)日:2021-05-27
申请号:US16690695
申请日:2019-11-21
Applicant: Adobe Inc.
Inventor: Verena Kaynig-Fittkau , Sruthi Madapoosi Ravi , Richard Cohn , Nikolaos Barmpalios , Michael Kraley , Kanchana Sethu
Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for generating diverse and realistic synthetic documents using deep learning. In particular, the disclosed systems can utilize a trained neural network to generate realistic image layouts comprising page elements that comply with layout parameters. The disclosed systems can also generate synthetic content corresponding to the page elements within the image layouts. The disclosed systems insert the synthetic content into the corresponding page elements of documents based on the image layouts to generate synthetic documents.
-
-
-
-
-
-
-