-
公开(公告)号:US20160027439A1
公开(公告)日:2016-01-28
申请号:US14340833
申请日:2014-07-25
Applicant: Google Inc.
Inventor: Matthew Sharifi
CPC classification number: G10L15/22 , G06F3/04842 , G06F3/167 , G10L15/063 , G10L15/08 , G10L15/18 , G10L15/265 , G10L15/30 , G10L2015/0631 , G10L2015/0638 , G10L2015/088 , G10L2015/223
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.
-
公开(公告)号:US09224385B1
公开(公告)日:2015-12-29
申请号:US13919170
申请日:2013-06-17
Applicant: Google Inc.
Inventor: Matthew Sharifi , Ben Shahshahani , Dominik Roblek
CPC classification number: G10L25/51 , G10H2210/046 , G10H2220/011 , G10H2240/141 , G10L15/26 , G10L21/10
Abstract: Methods, systems, and computer programs are presented for unified recognition of speech and music. One method includes an operation for starting an audio recognition mode by a computing device while receiving an audio stream. Segments of the audio stream are analyzed as the audio stream is received, where the analysis includes simultaneous checking for speech and music. Further, the method includes an operation for determining a first confidence score for speech and a second confidence score for music. As the audio stream is received, additional segments are analyzed until the end of the audio stream or until the first and second confidence scores indicate that the audio stream has been identified as speech or music. Further, results are presented on a display based on the identification of the audio stream, including text entered if the audio stream was speech or song information if the audio stream was music.
Abstract translation: 提出方法,系统和计算机程序,用于统一识别语音和音乐。 一种方法包括在接收音频流的同时由计算设备启动音频识别模式的操作。 当接收到音频流时,分析音频流的分段,其中分析包括语音和音乐的同时检查。 此外,该方法包括用于确定用于语音的第一可信度得分和用于音乐的第二可信度得分的操作。 当音频流被接收时,分析附加段直到音频流的结束,或者直到第一和第二置信度得分指示音频流已经被识别为语音或音乐。 此外,如果音频流是音乐,则在显示器上显示结果,该显示器基于音频流的标识,包括输入的文本,如果音频流是语音或歌曲信息。
-
113.
公开(公告)号:US09208154B1
公开(公告)日:2015-12-08
申请号:US14458387
申请日:2014-08-13
Applicant: Google Inc.
Inventor: Matthew Sharifi , Gheorghe Postelnicu
CPC classification number: G06F17/3002 , G06F17/30784 , G06F17/30864 , H04N21/2187 , H04N21/23418
Abstract: Down scoring overcrowded bands via IDF weighting scores provides a soft way to reduce the effect of common bands from Locality Sensitive Hashing (LSH) processes. An index component indexes live video references of a live streaming infrastructure pathway process in a reference index. A scoring component scores a set of bands with a set of inverse document frequency (IDF) weighting scores in the reference index. A high score is generated for bands that are featured in a small number of references and a low score is generated for bands featured in a high number of references.
Abstract translation: 通过IDF加权分数的下划线过度拥挤的频带提供了一种柔性的方法来减少局部敏感哈希(LSH)过程中常用频带的影响。 索引组件在参考索引中索引实况流基础设施路径进程的实时视频参考。 评分组件在参考指标中以一组逆文档频率(IDF)加权分数对一组频带进行评分。 对于以少量参考为特征的频带,产生高分,并且对于大量参考中的频带生成低分数。
-
114.
公开(公告)号:US09055376B1
公开(公告)日:2015-06-09
申请号:US13791131
申请日:2013-03-08
Applicant: Google Inc.
Inventor: Gheorghe Postelnicu , Aviv Reznik , Matthew Sharifi
IPC: H04R29/00
CPC classification number: H04R3/00 , G06F17/30743 , H04R2430/03
Abstract: Systems and methods are provided herein relating to audio classification. Genres of music can be identified by detecting unique spectral features inherent to those genres. One example genre detected is techno music. Two dimensional discrete cosine transforms can be generated for consecutive windows of the spectrogram or chromagram. A max value of the energy of portions of the two dimensional discrete cosine transforms can be determined. The max value can be normalized and aggregated with max values related to neighboring windows. If the aggregate scores meet a genre threshold, the audio sample, or portions thereof, can be associated with a genre of music.
Abstract translation: 本文提供了与音频分类有关的系统和方法。 可以通过检测这些类型固有的独特光谱特征来识别音乐类型。 检测到的一个例子是技术音乐。 可以为光谱图或色谱图的连续窗口生成二维离散余弦变换。 可以确定二维离散余弦变换的部分的能量的最大值。 最大值可以与相邻窗口相关的最大值进行归一化和聚合。 如果聚合分数满足类型阈值,则音频样本或其部分可以与音乐类型相关联。
-
公开(公告)号:US20150127342A1
公开(公告)日:2015-05-07
申请号:US14523198
申请日:2014-10-24
Applicant: Google Inc.
Inventor: Matthew Sharifi , Ignacio Lopez Moreno , Ludwig Schmidt
CPC classification number: G10L17/02 , G10L17/005 , G10L17/08 , G10L17/18 , G10L25/51
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, an utterance vector that is derived from an utterance is obtained. Hash values are determined for the utterance vector according to multiple different hash functions. A set of speaker vectors from a plurality of hash tables is determined using the hash values, where each speaker vector was derived from one or more utterances of a respective speaker. The speaker vectors in the set are compared with the utterance vector. A speaker vector is selected based on comparing the speaker vectors in the set with the utterance vector.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于执行说话人识别的计算机程序。 在一些实现中,获得从话语导出的话语向量。 根据多个不同的哈希函数为发声向量确定哈希值。 使用散列值来确定来自多个散列表的一组扬声器向量,其中每个扬声器向量是从相应说话者的一个或多个话语导出的。 将集合中的扬声器矢量与发声矢量进行比较。 基于将集合中的扬声器矢量与发声矢量进行比较来选择扬声器矢量。
-
公开(公告)号:US09002835B2
公开(公告)日:2015-04-07
申请号:US14047708
申请日:2013-10-07
Applicant: Google Inc.
Inventor: Matthew Sharifi
CPC classification number: G06F17/30041 , G06F17/30026 , G06F17/30029 , G06F17/30035 , G06F17/30044 , G06F17/30401 , G06F17/30424 , G06F17/30477 , G06F17/3053 , G06F17/30746 , G06F17/30787 , G06F17/30867 , G06F17/30876 , G06Q30/02 , G06Q30/0631 , G10L25/54
Abstract: Methods, systems, and apparatus for receiving a natural language query of a user, and environmental data, identifying a media item based on the environmental data, determining an entity type based on the natural language query, selecting an entity associated with the media item that matches the entity type, selecting, from a media consumption database that identifies media items that have been indicated as consumed by the user, one or more media items that have been indicated as consumed by the user and that are associated with the selected entity, and providing a response to the query based on selecting the one or more media items that have been indicated as consumed by the user and that are associated with the selected entity.
Abstract translation: 用于接收用户的自然语言查询的方法,系统和装置,以及环境数据,基于环境数据识别媒体项目,基于自然语言查询确定实体类型,选择与媒体项目相关联的实体, 匹配实体类型,从媒体消费数据库中选择,该媒体消费数据库标识已被指示为用户消费的媒体项目,已被指示为由用户消费并且与所选择的实体相关联的一个或多个媒体项目,以及 基于选择已被指示为由用户消费并且与所选择的实体相关联的一个或多个媒体项来向所述查询提供响应。
-
117.
公开(公告)号:US08918382B1
公开(公告)日:2014-12-23
申请号:US13889681
申请日:2013-05-08
Applicant: Google Inc.
Inventor: Matthew Sharifi , Gheorghe Postelnicu
IPC: G06F17/30
CPC classification number: G06F17/30722 , G06F17/30038 , G06F17/30371 , G06F17/30817
Abstract: This disclosure relates to learning common spelling errors of metadata terms associated with content through content matching, such as content matching using fingerprints.
Abstract translation: 本公开涉及通过内容匹配(例如使用指纹的内容匹配)来学习与内容相关联的元数据术语的常见拼写错误。
-
118.
公开(公告)号:US08838609B1
公开(公告)日:2014-09-16
申请号:US13648511
申请日:2012-10-10
Applicant: Google Inc.
Inventor: Matthew Sharifi , Gheorghe Postelnicu
CPC classification number: G06F17/3002 , G06F17/30784 , G06F17/30864 , H04N21/2187 , H04N21/23418
Abstract: Down scoring overcrowded bands via IDF weighting scores provides a soft way to reduce the effect of common bands from Locality Sensitive Hashing (LSH) processes. An index component indexes live video references of a live streaming infrastructure pathway process in a reference index. A scoring component scores a set of bands with a set of inverse document frequency (IDF) weighting scores in the reference index. A high score is generated for bands that are featured in a small number of references and a low score is generated for bands featured in a high number of references.
Abstract translation: 通过IDF加权分数的下划线过度拥挤的频带提供了一种柔性的方法来减少局部敏感哈希(LSH)过程中常用频带的影响。 索引组件在参考索引中索引实况流基础设施路径进程的实时视频参考。 评分组件在参考指标中以一组逆文档频率(IDF)加权分数对一组频带进行评分。 对于以少量参考为特征的频带,产生高分,并且对于大量参考中的频带生成低分数。
-
公开(公告)号:US20140074466A1
公开(公告)日:2014-03-13
申请号:US13626439
申请日:2012-09-25
Applicant: GOOGLE INC.
Inventor: Matthew Sharifi , Gheorghe Postelnicu
IPC: G10L15/26
CPC classification number: G10L15/22 , G06F16/3329 , G06F16/3344 , G06F16/433 , G06F16/686 , G10L15/08 , G10L15/1815 , G10L15/24 , G10L15/30 , G10L2015/088 , G10L2015/223 , G10L2015/225
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural language query processing engine, wherein the query includes at least a portion of the transcription and data that identifies the entity, and obtaining one or more results of the query.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于接收编码话语和环境数据的音频数据,获得话语的转录,使用环境数据识别实体,向自然语言提交查询 查询处理引擎,其中查询包括识别实体的转录和数据的至少一部分,以及获得查询的一个或多个结果。
-
公开(公告)号:US10311249B2
公开(公告)日:2019-06-04
申请号:US15476392
申请日:2017-03-31
Applicant: Google Inc.
Inventor: Matthew Sharifi , Jakob Nicolaus Foerster
Abstract: A method includes determining, based at least in part on a type of information to be displayed at a display device associated with a computing device, a privacy level for the information to be displayed; and determining whether the privacy level satisfies a threshold privacy level. The method also includes, responsive to determining that the privacy level satisfies the threshold privacy level, determining whether an individual not associated with a currently active user account of the computing device is proximate to the display device. The method also includes determining an estimated speed of the individual not associated with the currently active user account relative to the display device. The method further includes determining, whether the estimated speed satisfies a threshold speed, and responsive to determining that the estimated speed satisfies the threshold speed, outputting the information such that at least a first portion of the information is obscured.
-
-
-
-
-
-
-
-
-