-
公开(公告)号:US09785705B1
公开(公告)日:2017-10-10
申请号:US14516122
申请日:2014-10-16
Applicant: Google Inc.
Inventor: Marc-Allen Cartright , Luis Garcia Pueyo , Vanja Josifovski , Amitabh Saikia , Jie Yang , Mike Bendersky , MyLinh Yang
IPC: G06F17/30
CPC classification number: G06F17/30705
Abstract: Methods, apparatus, systems, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of plain text communications such as emails may be grouped into clusters based on one or more similarities between the plain text communications. One or more segments of communications of a particular cluster may be classified as transient based on textual pattern matching. One or more other segments of the communications of the particular cluster may be classified as transient based on various criteria. One or more transient segments may be assigned a generic and/or specific semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent plain text communications, content associated with transient (and in some cases, non-confidential) segments.
-
公开(公告)号:US09563689B1
公开(公告)日:2017-02-07
申请号:US14470510
申请日:2014-08-27
Applicant: Google Inc.
Inventor: Luis Garcia Pueyo , Vanja Josifovski , Amitabh Saikia , Jie Yang , Mike Bendersky , Srinidhi Viswanatha , Marc-Allen Cartright
IPC: G06F17/30
CPC classification number: G06F17/30705
Abstract: Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.
Abstract translation: 提供了用于生成和应用数据提取模板的方法,装置和计算机可读介质。 在各种实现中,诸如电子邮件的结构化通信语料库可以基于结构化通信之间的一个或多个相似性被分组成群集。 可以从特定集群的结构化通信中识别一组结构路径。 该集合的一个或多个结构路径可以被分类为瞬时,其中跨越特定集群的一个或多个相关联的文本段的出现次数满足标准。 可以基于各种信号为一个或多个瞬态路径分配语义数据类型和/或机密性指定。 可以生成数据提取模板,以从后续结构化通信中提取与瞬态(以及在一些情况下,非机密)结构路径相关联的文本段。
-
公开(公告)号:US10216838B1
公开(公告)日:2019-02-26
申请号:US15394610
申请日:2016-12-29
Applicant: Google Inc.
Inventor: Luis Garcia Pueyo , Vanja Josifovski , Amitabh Saikia , Jie Yang , Mike Bendersky , Srinidhi Viswanatha , Marc-Allen Cartright
Abstract: Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.
-
公开(公告)号:US10216837B1
公开(公告)日:2019-02-26
申请号:US14584905
申请日:2014-12-29
Applicant: Google Inc.
Inventor: Amitabh Saikia , Marc-Allen Cartright , Luis Garcia Pueyo , Vanja Josifovski , Jie Yang , Mike Bendersky , MyLinh Yang
Abstract: Methods, apparatus, systems, and computer-readable media are provided for selecting pattern matching segments suitable for electronic communication clustering. A set of pattern matching segments may be identified that match at least one of a corpus of electronic communication addresses. A measure of coverage of each of the set of pattern matching segments across the corpus of electronic communication addresses may be determined. A score associated with each pattern matching segment may be determined based on the measure of coverage and one or more measures of flexibility associated with each of the set of pattern matching segments. One or more of the pattern matching segments may be selected based on the determine scores. A corpus of electronic communications may then be grouped into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to electronic communication addresses associated with the corpus of electronic communications.
-
公开(公告)号:US20180113866A1
公开(公告)日:2018-04-26
申请号:US15332839
申请日:2016-10-24
Applicant: Google Inc.
Inventor: Mike Bendersky , Marc Alexander Najork , Donald Metzler , Xuanhui Wang
IPC: G06F17/30
CPC classification number: G06F16/24578 , G06F16/248 , G06F16/252 , G06F16/337
Abstract: Methods and apparatus related to using document feature(s) of a document that is responsive to a query, and optionally query feature(s) of the query, to determine a presentation characteristic for presenting a search result that corresponds to the document. In some implementations, measures associated with the document feature(s) and/or query feature(s) may be used to determine the presentation characteristic. The measures may be based on past interactions, by corresponding users, with other documents that share one or more of the document features with the document, where a plurality of the other documents are different from the document (and optionally each different from one another). In some implementations, the document and/or the other documents include, or are restricted to, documents that are access restricted.
-
公开(公告)号:US09953185B2
公开(公告)日:2018-04-24
申请号:US14950052
申请日:2015-11-24
Applicant: Google Inc.
Inventor: Mike Bendersky , Donald Metzler , Marc Alexander Najork , Dor Naveh , Vlad Panait , Xuanhui Wang
CPC classification number: G06F21/6227 , G06F17/30477 , G06F17/30522 , G06F17/3053 , G06F17/3064 , G06F17/30867 , G06F17/3097 , G06F21/6245
Abstract: In various implementations, a plurality of non-private n-grams that satisfy a privacy criterion may be identified within a search log of private search queries and corresponding post-search activity. A plurality of query patterns may be generated based on the plurality of non-private n-grams. Aggregate search activity statistics associated with each of the plurality of query patterns may be determined from the search log. Aggregate search activity statistics associated with each query pattern may be indicative of search activity associated with a plurality of private search queries in the search log that match the query pattern. In response to a determination that aggregate search activity statistics for a given query pattern satisfy a performance criterion, a methodology for generating data that is presented in response to search queries that match the given query pattern may be altered based on aggregate search activity statistics associated with the given query pattern.
-
公开(公告)号:US20170293696A1
公开(公告)日:2017-10-12
申请号:US15095517
申请日:2016-04-11
Applicant: Google Inc.
Inventor: Mike Bendersky , Vijay Garg , Sujith Ravi , Cheng Li
CPC classification number: G06F16/9024 , G06F16/24578 , G06F16/951 , G06N5/022 , G06N20/00 , G06Q10/1095
Abstract: A computing device may generate, a graph that includes a plurality of nodes, wherein the plurality of nodes includes a plurality of entity nodes representing a plurality of entities and a plurality of feature nodes representing a plurality of features, and wherein each of the plurality of entity nodes is connected in the graph to one or more of the plurality of feature nodes. The computing device may perform label propagation to associate a distribution of labels with each of the plurality of nodes. The computing device may be configured to receive an indication of at least one of a feature of interest or an entity of interest. The computing device may further be configured to output an indication of one or more related entities that are related to the feature of interest or the entity of interest.
-
公开(公告)号:US09756073B2
公开(公告)日:2017-09-05
申请号:US15416632
申请日:2017-01-26
Applicant: Google Inc.
Inventor: Mike Bendersky , Luis Garcia Pueyo , Kashyap Ramesh Puranik , Amitabh Saikia , Jie Yang , Marc-Allen Cartright
CPC classification number: H04L63/1483 , H04L63/0254 , H04L63/1425 , H04L63/20
Abstract: Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a legitimate entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with one or more legitimate entities associated with the one or more matched templates. In various implementations, the communication may be classified as a phishing attempt based on the determining.
-
公开(公告)号:US09596265B2
公开(公告)日:2017-03-14
申请号:US14711407
申请日:2015-05-13
Applicant: Google Inc.
Inventor: Mike Bendersky , Luis Garcia Pueyo , Kashyap Ramesh Puranik , Amitabh Saikia , Jie Yang , Marc-Allen Cartright
IPC: H04L29/06
CPC classification number: H04L63/1483 , H04L63/0254 , H04L63/1425 , H04L63/20
Abstract: Methods, apparatus, systems, and computer-readable media are provided for determining whether communications are attempts at phishing. In various implementations, a potentially-deceptive communication may be matched to one or more templates of a plurality of templates. Each template may represent content shared among a cluster of communications sent by a trustworthy entity. In various implementations, it may be determined that an address associated with the communication is not affiliated with one or more trustworthy entities associated with the one or more matched templates. In various implementations, the communication may be classified as a phishing attempt based on the determining.
Abstract translation: 提供了方法,装置,系统和计算机可读介质,用于确定通信是否是网络钓鱼的尝试。 在各种实现中,潜在的欺骗性通信可以与多个模板中的一个或多个模板相匹配。 每个模板可以表示由可信赖实体发送的通信集群之间共享的内容。 在各种实现中,可以确定与通信相关联的地址不隶属于与一个或多个匹配模板相关联的一个或多个可信赖实体。 在各种实现中,可以基于确定将通信分类为网络钓鱼尝试。
-
-
-
-
-
-
-
-