-
公开(公告)号:US20180144042A1
公开(公告)日:2018-05-24
申请号:US15360939
申请日:2016-11-23
Applicant: Google Inc.
Inventor: Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt
CPC classification number: G06F16/285 , G06F16/93 , G06F17/243 , G06F17/248 , G06N20/00 , G06N20/20 , G06Q10/10
Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.
-
公开(公告)号:US10657158B2
公开(公告)日:2020-05-19
申请号:US15360939
申请日:2016-11-23
Applicant: Google Inc.
Inventor: Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt
IPC: G06F16/00 , G06F16/28 , G06N20/00 , G06F16/93 , G06Q10/10 , G06N20/20 , G06F40/174 , G06F40/186
Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.
-
公开(公告)号:US10387559B1
公开(公告)日:2019-08-20
申请号:US15359101
申请日:2016-11-22
Applicant: Google Inc.
Inventor: James Wendt , Jie Yang , Ying Sheng , Jing Xie , Luis Garcia Pueyo
IPC: G06F16/35 , G06F17/24 , G06F16/9535 , G06F17/27 , G06F16/93
Abstract: Methods and apparatus are described herein for creating associations between user interests and electronic document templates generated from B2C electronic documents. Once these associations are created, interest(s) of a user (e.g., a user profile) may be determined automatically based on B2C electronic documents addressed to the user. In various implementations, an electronic document addressed to a user may be identified. A particular electronic document template that corresponds to the electronic document addressed to the user may be selected from a plurality of electronic document templates. The selecting may be based on attribute(s) shared between the electronic document addressed to the user and the selected electronic document template. The particular electronic template may be generated from a plurality of electronic documents that share fixed content. Interest(s) associated with the particular electronic document template may be identified, and association(s) between the user and the identified interest(s) may be stored.
-
-