Patent search ap:("Google Inc.") AND inv:"Vanja Josifovski" Page 1

1.

发明授权
Information redaction from document data 有权

公开(公告)号：US09734148B2

公开(公告)日：2017-08-15

申请号：US14520018

申请日：2014-10-21

Applicant: Google Inc.

Inventor： Mike Bendersky , Vanja Josifovski , Amitabh Saikia , Marc-Allen Cartright , Jie Yang , Luis Garcia Pueyo , MyLinh Yang

IPC: G06F17/30 , G06F21/62 , G06F21/64 , G06Q10/10 , G06Q30/02

CPC classification number: G06F17/30011 , G06F21/6218 , G06F21/6227 , G06F21/6254 , G06F21/64 , G06Q10/101 , G06Q30/0254

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.

2.

发明授权
Generating and applying event data extraction templates 有权

公开(公告)号：US10360537B1

公开(公告)日：2019-07-23

申请号：US15484933

申请日：2017-04-11

Applicant: Google Inc.

Inventor： Mike Bendersky , Maureen Heymans , Jinan Lou , Jie Yang , MyLinh Yang , Amitabh Saikia , Marc-Allen Cartright , Vanja Josifovski , Hui Tan , Luis Garcia Pueyo

IPC: G06F17/30 , G06Q10/10 , G06F16/248 , G06F16/9535 , H04W4/029

Abstract: Techniques are described herein for generating and applying event data extraction templates. In various implementations, a data extraction template may be applied to structured communications to extract, from each structured communication, event data associated with a transient markup language path indicated in the data extraction template. The data extraction template may include an event-related semantic data type assigned to the transient markup language path and a strength of association between the transient structural path and the event-related semantic data type. Feedback may be obtained concerning event data extracted from one or more of the structured communications. Based on the feedback, the strength of association between the transient markup language path and the event-related semantic data type may be altered. The data extraction template may then be applied to a subsequent structured communication to extract new event data from the structured communication based on the altered strength of association.

3.

发明授权
Generating and applying event data extraction templates 有权

公开(公告)号：US09652530B1

公开(公告)日：2017-05-16

申请号：US14470416

申请日：2014-08-27

Applicant: Google Inc.

Inventor： Mike Bendersky , Maureen Heymans , Jinan Lou , Jie Yang , MyLinh Yang , Amitabh Saikia , Marc-Allen Cartright , Vanja Josifovski , Hui Tan , Luis Garcia Pueyo

IPC: G06F17/30

CPC classification number: G06F17/30705 , G06F17/30923

Abstract: Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be applied to the communications of the corpus. A determination may be made, based on the applying, that the communications of the corpus are event-related. An event data type may be assigned to the transient structural path based on the applying. An event data extraction template may be generated to extract, from one or more subsequent communications, one or more event-related segments of text associated with the transient structural path.

4.

发明申请
CLASSIFYING DOCUMENTS BY CLUSTER 审中-公开
Title translation: 按CLUSTER分类文件

公开(公告)号：US20160314184A1

公开(公告)日：2016-10-27

申请号：US14697342

申请日：2015-04-27

Applicant: Google Inc.

Inventor： Mike Bendersky , Jie Yang , Amitabh Saikia , Marc-Allen Cartright , Sujith Ravi , Balint Miklos , Ivo Krka , Vanja Josifovski , James Wendt , Luis Garcia Pueyo

IPC: G06F17/30

CPC classification number: G06F16/35 , G06Q10/107

Abstract: Methods, apparatus, systems, and computer-readable media are provided for classifying, or “labeling,” documents such as emails en masse based on association with a cluster/template. In various implementations, a corpus of documents may be grouped into a plurality of disjoint clusters of documents based on one or more shared content attributes. A classification distribution associated with a first cluster of the plurality of clusters may be determined based on classifications assigned to individual documents of the first cluster. A classification distribution associated with a second cluster of the plurality of clusters may then be determined based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.

Abstract translation: 提供了方法，装置，系统和计算机可读介质，用于基于与集群/模板的关联来整合或“标记”诸如电子邮件的文档。在各种实现中，基于一个或多个共享内容属性，文档的语料库可以被分组成多个不相交的文档簇。可以基于分配给第一集群的单个文档的分类来确定与多个集群中的第一集群相关联的分类分发。然后可以至少部分地基于与第一集群相关联的分类分布和第一和第二集群之间的关系来确定与多个集群中的第二集群相关联的分类分发。

5.

发明申请
INFORMATION REDACTION FROM DOCUMENT DATA 有权
Title translation: 从文件数据中删除的信息

公开(公告)号：US20160110352A1

公开(公告)日：2016-04-21

申请号：US14520018

申请日：2014-10-21

Applicant: Google Inc.

Inventor： Mike Bendersky , Vanja Josifovski , Amitabh Saikia , Marc-Allen Cartright , Jie Yang , Luis Garcia Pueyo , MyLinh Yang

IPC: G06F17/30

CPC classification number: G06F17/30011 , G06F21/6218 , G06F21/6227 , G06F21/6254 , G06F21/64 , G06Q10/101 , G06Q30/0254

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.

Abstract translation: 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于从为包括个人信息的一组文档生成的文档集合中进行数据的修改。数据的修改部分基于文档收集与一组用户的个人文档的比较，用户已经提供明确的批准用于处理文档集合。

6.

发明授权
Generating and applying data extraction templates 有权

公开(公告)号：US10216838B1

公开(公告)日：2019-02-26

申请号：US15394610

申请日：2016-12-29

Applicant: Google Inc.

Inventor： Luis Garcia Pueyo , Vanja Josifovski , Amitabh Saikia , Jie Yang , Mike Bendersky , Srinidhi Viswanatha , Marc-Allen Cartright

IPC: G06F17/30 , G06F21/62 , G06F17/27

Abstract: Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.

7.

发明授权
Selecting pattern matching segments for electronic communication clustering 有权

公开(公告)号：US10216837B1

公开(公告)日：2019-02-26

申请号：US14584905

申请日：2014-12-29

Applicant: Google Inc.

Inventor： Amitabh Saikia , Marc-Allen Cartright , Luis Garcia Pueyo , Vanja Josifovski , Jie Yang , Mike Bendersky , MyLinh Yang

IPC: G06F17/30 , G06F17/00 , G06N5/00 , G06Q10/10

Abstract: Methods, apparatus, systems, and computer-readable media are provided for selecting pattern matching segments suitable for electronic communication clustering. A set of pattern matching segments may be identified that match at least one of a corpus of electronic communication addresses. A measure of coverage of each of the set of pattern matching segments across the corpus of electronic communication addresses may be determined. A score associated with each pattern matching segment may be determined based on the measure of coverage and one or more measures of flexibility associated with each of the set of pattern matching segments. One or more of the pattern matching segments may be selected based on the determine scores. A corpus of electronic communications may then be grouped into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to electronic communication addresses associated with the corpus of electronic communications.

8.

发明授权
Generating and applying a trained structured machine learning model for determining a semantic label for content of a transient segment of a communication 有权

公开(公告)号：US10540610B1

公开(公告)日：2020-01-21

申请号：US15139807

申请日：2016-04-27

Applicant: Google Inc.

Inventor： Jie Yang , Amr Ahmed , Luis Garcia Pueyo , Mike Bendersky , Amitabh Saikia , Marc-Allen Cartright , Marc Alexander Najork , MyLinh Yang , Hui Tan , Weinan Zhang , Vanja Josifovski , Alexander J. Smola

IPC: G06N7/00 , G06N20/00 , H04L12/58

Abstract: Methods, apparatus, and computer-readable media are provided for analyzing a cluster of communications, such as B2C emails, to generate a template for the cluster that defines transient segments and fixed segments of the cluster of communications. More particularly, methods, apparatus, and computer-readable media are provided for generating and/or applying a trained structured machine learning model for a generated template that can be used to determine, for one or more transient segments of subsequent communications, a corresponding probability that a given semantic label is the correct semantic label for extracted content of the transient segment(s).

9.

发明申请
AUTOMATIC GENERATION OF TEMPLATES FOR PARSING ELECTRONIC DOCUMENTS 审中-公开

公开(公告)号：US20170308517A1

公开(公告)日：2017-10-26

申请号：US14024147

申请日：2013-09-11

Applicant: Google Inc.

Inventor： Vanja Josifovski , Srinidhi Viswanatha

IPC: G06F17/24

CPC classification number: G06Q10/10 , G06F16/313

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a plurality of electronic documents, each electronic document being associated with an identifier that is associated with a source of the electronic document, grouping electronic documents of the plurality of electronic documents into a plurality of base sub-groups based on respective sources, for each base sub-group of the plurality of base sub-groups, automatically processing electronic documents to provide one or more templates, each template mapping content to one or more markers, and storing the one or more templates in memory, each template being accessible by one or more parsers to parse content from subsequently received electronic documents.

10.

发明授权
Generating and applying data extraction templates 有权

公开(公告)号：US09785705B1

公开(公告)日：2017-10-10

申请号：US14516122

申请日：2014-10-16

Applicant: Google Inc.

Inventor： Marc-Allen Cartright , Luis Garcia Pueyo , Vanja Josifovski , Amitabh Saikia , Jie Yang , Mike Bendersky , MyLinh Yang

IPC: G06F17/30

CPC classification number: G06F17/30705

Abstract: Methods, apparatus, systems, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of plain text communications such as emails may be grouped into clusters based on one or more similarities between the plain text communications. One or more segments of communications of a particular cluster may be classified as transient based on textual pattern matching. One or more other segments of the communications of the particular cluster may be classified as transient based on various criteria. One or more transient segments may be assigned a generic and/or specific semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent plain text communications, content associated with transient (and in some cases, non-confidential) segments.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification