Patent search ap:("Google Inc.") AND inv:"Jie Yang" Page 2

11.

发明授权
Generating and applying data extraction templates 有权
Title translation: 生成和应用数据提取模板

公开(公告)号：US09563689B1

公开(公告)日：2017-02-07

申请号：US14470510

申请日：2014-08-27

Applicant: Google Inc.

Inventor： Luis Garcia Pueyo , Vanja Josifovski , Amitabh Saikia , Jie Yang , Mike Bendersky , Srinidhi Viswanatha , Marc-Allen Cartright

IPC: G06F17/30

CPC classification number: G06F17/30705

Abstract: Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.

Abstract translation: 提供了用于生成和应用数据提取模板的方法，装置和计算机可读介质。在各种实现中，诸如电子邮件的结构化通信语料库可以基于结构化通信之间的一个或多个相似性被分组成群集。可以从特定集群的结构化通信中识别一组结构路径。该集合的一个或多个结构路径可以被分类为瞬时，其中跨越特定集群的一个或多个相关联的文本段的出现次数满足标准。可以基于各种信号为一个或多个瞬态路径分配语义数据类型和/或机密性指定。可以生成数据提取模板，以从后续结构化通信中提取与瞬态（以及在一些情况下，非机密）结构路径相关联的文本段。

12.

发明授权
Template-based structured document classification and extraction 有权

公开(公告)号：US10657158B2

公开(公告)日：2020-05-19

申请号：US15360939

申请日：2016-11-23

Applicant: Google Inc.

Inventor： Ying Sheng , Yifeng Lu , Jing Xie , Jie Yang , Luis Garcia Pueyo , Jinan Lou , James Wendt

IPC: G06F16/00 , G06F16/28 , G06N20/00 , G06F16/93 , G06Q10/10 , G06N20/20 , G06F40/174 , G06F40/186

Abstract: Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.

13.

发明授权
Template-based identification of user interest 有权

公开(公告)号：US10387559B1

公开(公告)日：2019-08-20

申请号：US15359101

申请日：2016-11-22

Applicant: Google Inc.

Inventor： James Wendt , Jie Yang , Ying Sheng , Jing Xie , Luis Garcia Pueyo

IPC: G06F16/35 , G06F17/24 , G06F16/9535 , G06F17/27 , G06F16/93

Abstract: Methods and apparatus are described herein for creating associations between user interests and electronic document templates generated from B2C electronic documents. Once these associations are created, interest(s) of a user (e.g., a user profile) may be determined automatically based on B2C electronic documents addressed to the user. In various implementations, an electronic document addressed to a user may be identified. A particular electronic document template that corresponds to the electronic document addressed to the user may be selected from a plurality of electronic document templates. The selecting may be based on attribute(s) shared between the electronic document addressed to the user and the selected electronic document template. The particular electronic template may be generated from a plurality of electronic documents that share fixed content. Interest(s) associated with the particular electronic document template may be identified, and association(s) between the user and the identified interest(s) may be stored.

14.

发明申请
IDENTIFYING PHISHING COMMUNICATIONS USING TEMPLATES 有权
Title translation: 使用模板识别通信通信

公开(公告)号：US20160337401A1

公开(公告)日：2016-11-17

申请号：US14711407

申请日：2015-05-13

Applicant: Google Inc.

Inventor： Mike Bendersky , Luis Garcia Pueyo , Kashyap Ramesh Puranik , Amitabh Saikia , Jie Yang , Marc-Allen Cartright

IPC: H04L29/06

CPC classification number: H04L63/1483 , H04L63/0254 , H04L63/1425 , H04L63/20

Abstract: Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a trustworthy entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with one or more trustworthy entities associated with the one or more matched templates. In various implementations, the communication may be classified as a phishing attempt based on the determining.

Abstract translation: 提供了方法，装置，系统和计算机可读介质，用于确定通信是否是网络钓鱼的尝试。在各种实现中，潜在的欺骗性通信可以与多个模板中的一个或多个模板相匹配。每个模板可以表示由可信赖实体发送的通信集群之间共享的内容。在各种实现中，可以确定与通信相关联的地址不隶属于与一个或多个匹配模板相关联的一个或多个可信赖实体。在各种实现中，可以基于确定将通信分类为网络钓鱼尝试。

15.

发明申请
INFORMATION REDACTION FROM DOCUMENT DATA 有权
Title translation: 从文件数据中删除的信息

公开(公告)号：US20160110352A1

公开(公告)日：2016-04-21

申请号：US14520018

申请日：2014-10-21

Applicant: Google Inc.

Inventor： Mike Bendersky , Vanja Josifovski , Amitabh Saikia , Marc-Allen Cartright , Jie Yang , Luis Garcia Pueyo , MyLinh Yang

IPC: G06F17/30

CPC classification number: G06F17/30011 , G06F21/6218 , G06F21/6227 , G06F21/6254 , G06F21/64 , G06Q10/101 , G06Q30/0254

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于从为包括个人信息的一组文档生成的文档集合中进行数据的修改。数据的修改部分基于文档收集与一组用户的个人文档的比较，用户已经提供明确的批准用于处理文档集合。

16.

发明授权
Information redaction from document data 有权

公开(公告)号：US09734148B2

公开(公告)日：2017-08-15

申请号：US14520018

申请日：2014-10-21

Applicant: Google Inc.

Inventor： Mike Bendersky , Vanja Josifovski , Amitabh Saikia , Marc-Allen Cartright , Jie Yang , Luis Garcia Pueyo , MyLinh Yang

IPC: G06F17/30 , G06F21/62 , G06F21/64 , G06Q10/10 , G06Q30/02

CPC classification number: G06F17/30011 , G06F21/6218 , G06F21/6227 , G06F21/6254 , G06F21/64 , G06Q10/101 , G06Q30/0254

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.

17.

发明申请
IDENTIFYING PHISHING COMMUNICATIONS USING TEMPLATES 有权

公开(公告)号：US20170149824A1

公开(公告)日：2017-05-25

申请号：US15416632

申请日：2017-01-26

Applicant: Google Inc.

Inventor： Mike Bendersky , Luis Garcia Pueyo , Kashyap Ramesh Puranik , Amitabh Saikia , Jie Yang , Marc-Allen Cartright

IPC: H04L29/06

CPC classification number: H04L63/1483 , H04L63/0254 , H04L63/1425 , H04L63/20

Abstract: Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a legitimate entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with one or more legitimate entities associated with the one or more matched templates. In various implementations, the communication may be classified as a phishing attempt based on the determining.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification