Patent search ap:("Google Inc.") AND inv:"James Wendt" Page 1

1.

发明申请
TEMPLATE-BASED STRUCTURED DOCUMENT CLASSIFICATION AND EXTRACTION 审中-公开

公开(公告)号：US20180144042A1

公开(公告)日：2018-05-24

申请号：US15360939

申请日：2016-11-23

Applicant: Google Inc.

Inventor： Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt

IPC: G06F17/30 , G06F17/24 , G06N99/00

CPC classification number: G06F16/285 , G06F16/93 , G06F17/243 , G06F17/248 , G06N20/00 , G06N20/20 , G06Q10/10

Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.

2.

发明申请
CLASSIFYING DOCUMENTS BY CLUSTER 审中-公开
Title translation: 按CLUSTER分类文件

公开(公告)号：US20160314184A1

公开(公告)日：2016-10-27

申请号：US14697342

申请日：2015-04-27

Applicant: Google Inc.

Inventor： Mike Bendersky , Jie Yang , Amitabh Saikia , Marc-Allen Cartright , Sujith Ravi , Balint Miklos , Ivo Krka , Vanja Josifovski , James Wendt , Luis Garcia Pueyo

IPC: G06F17/30

CPC classification number: G06F16/35 , G06Q10/107

Abstract: Methods, apparatus, systems, and computer-readable media are provided for classifying, or “labeling,” documents such as emails en masse based on association with a cluster/template. In various implementations, a corpus of documents may be grouped into a plurality of disjoint clusters of documents based on one or more shared content attributes. A classification distribution associated with a first cluster of the plurality of clusters may be determined based on classifications assigned to individual documents of the first cluster. A classification distribution associated with a second cluster of the plurality of clusters may then be determined based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.

Abstract translation: 提供了方法，装置，系统和计算机可读介质，用于基于与集群/模板的关联来整合或“标记”诸如电子邮件的文档。在各种实现中，基于一个或多个共享内容属性，文档的语料库可以被分组成多个不相交的文档簇。可以基于分配给第一集群的单个文档的分类来确定与多个集群中的第一集群相关联的分类分发。然后可以至少部分地基于与第一集群相关联的分类分布和第一和第二集群之间的关系来确定与多个集群中的第二集群相关联的分类分发。

3.

发明授权
Template-based structured document classification and extraction 有权

公开(公告)号：US10657158B2

公开(公告)日：2020-05-19

申请号：US15360939

申请日：2016-11-23

Applicant: Google Inc.

Inventor： Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt

IPC: G06F16/00 , G06F16/28 , G06N20/00 , G06F16/93 , G06Q10/10 , G06N20/20 , G06F40/174 , G06F40/186

Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.

4.

发明授权
Template-based identification of user interest 有权

公开(公告)号：US10387559B1

公开(公告)日：2019-08-20

申请号：US15359101

申请日：2016-11-22

Applicant: Google Inc.

Inventor： James Wendt , Jie Yang , Ying Sheng , Jing Xie , Luis Garcia Pueyo

IPC: G06F16/35 , G06F17/24 , G06F16/9535 , G06F17/27 , G06F16/93

Abstract: Methods and apparatus are described herein for creating associations between user interests and electronic document templates generated from B2C electronic documents. Once these associations are created, interest(s) of a user (e.g., a user profile) may be determined automatically based on B2C electronic documents addressed to the user. In various implementations, an electronic document addressed to a user may be identified. A particular electronic document template that corresponds to the electronic document addressed to the user may be selected from a plurality of electronic document templates. The selecting may be based on attribute(s) shared between the electronic document addressed to the user and the selected electronic document template. The particular electronic template may be generated from a plurality of electronic documents that share fixed content. Interest(s) associated with the particular electronic document template may be identified, and association(s) between the user and the identified interest(s) may be stored.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification