-
公开(公告)号:US20150294670A1
公开(公告)日:2015-10-15
申请号:US14612830
申请日:2015-02-03
Applicant: Google Inc.
Inventor: Dominik Roblek , Matthew Sharifi , Raziel Alvarez
IPC: G10L17/18
CPC classification number: G10L17/18 , G10L17/005
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker verification. The methods, systems, and apparatus include actions of inputting speech data that corresponds to a particular utterance to a first neural network and determining an evaluation vector based on output at a hidden layer of the first neural network. Additional actions include obtaining a reference vector that corresponds to a past utterance of a particular speaker. Further actions include inputting the evaluation vector and the reference vector to a second neural network that is trained on a set of labeled pairs of feature vectors to identify whether speakers associated with the labeled pairs of feature vectors are the same speaker. More actions include determining, based on an output of the second neural network, whether the particular utterance was likely spoken by the particular speaker.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的用于说话者验证的计算机程序。 方法,系统和装置包括将对应于特定话语的语音数据输入到第一神经网络并基于第一神经网络的隐藏层处的输出来确定评估向量的动作。 附加动作包括获得对应于特定说话者的过去话语的参考矢量。 进一步的动作包括将评估向量和参考矢量输入到第二神经网络,该第二神经网络被训练在一组标记的特征矢量对上,以识别与标记的特征矢量对相关联的扬声器是否是相同的扬声器。 更多的动作包括基于第二神经网络的输出确定特定话语是否可能由特定说话者说出。
-
2.
公开(公告)号:US09953216B2
公开(公告)日:2018-04-24
申请号:US14596168
申请日:2015-01-13
Applicant: Google Inc.
Inventor: Raziel Alvarez
CPC classification number: G06K9/00355 , G06F3/017 , G06K9/2081
Abstract: Systems, methods, and computer-readable media are provided for performing actions in response to gestures made by a user in captured images. In accordance with one implementation, a computer-implemented system is provided that includes an image capture device that captures at least one image, a memory device that stores instructions, and at least one processor that executes the instructions stored in the memory device. In some implementations, the processor receives, from the image capture device, at least one image including a gesture made by a user and analyzes the at least one image to identify the gesture made by the user. In some implementations, the processor also determines, based on the identified gesture, one or more actions to perform on the at least one image.
-
公开(公告)号:US20160099007A1
公开(公告)日:2016-04-07
申请号:US14727741
申请日:2015-06-01
Applicant: Google Inc.
Inventor: Raziel Alvarez , Preetum Nakkiran
IPC: G10L21/034
CPC classification number: G10L21/034 , G10L25/78 , H03G3/3005
Abstract: This specification describes, among other things, a computer-implemented method. The method can include receiving a stream of audio data at a computing device. The stream of audio data can be segmented into a plurality of audio segments. Respective intensity levels are determined for each of the plurality of audio segments. For each of the plurality of audio segments and based on the respective intensity levels, a determination can be made as to whether the audio segment includes a speech signal. Selective gain control can be performed on the stream of audio data by automatically adjusting a gain of particular ones of the plurality of audio segments that are determined to include a speech signal.
Abstract translation: 本说明书尤其描述了计算机实现的方法。 该方法可以包括在计算设备处接收音频数据流。 音频数据流可以被分割成多个音频段。 针对多个音频片段中的每一个确定相应的强度级别。 对于多个音频片段中的每一个并且基于相应的强度级别,可以确定音频片段是否包括语音信号。 可以通过自动调整被确定为包括语音信号的多个音频片段中的特定音频片段的增益,来对音频数据流执行选择性增益控制。
-
-