Patent search ap:("SRI International") AND inv:"Horacio Franco" Page 1

1.

发明授权
Time-frequency convolutional neural network with bottleneck architecture for query-by-example processing 有权

公开(公告)号：US10777188B2

公开(公告)日：2020-09-15

申请号：US16191296

申请日：2018-11-14

Applicant: SRI International

Inventor： Julien van Hout , Vikramjit Mitra , Horacio Franco , Emre Yilmaz

IPC: G10L15/00 , G10L15/16 , G10L15/22 , G06N3/08 , G06N3/04 , G06F40/242 , G10L15/08

Abstract: A computing system determines whether a reference audio signal contains a query. A time-frequency convolutional neural network (TFCNN) comprises a time and frequency convolutional layers and a series of additional layers, which include a bottleneck layer. The computation engine applies the TFCNN to samples of a query utterance at least through the bottleneck layer. A query feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the query utterance. The computation engine also applies the TFCNN to samples of the reference audio signal at least through the bottleneck layer. A reference feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the reference audio signal. The computation engine determines at least one detection score based on the query feature vector and the reference feature vector.

2.

发明申请
TIME-FREQUENCY CONVOLUTIONAL NEURAL NETWORK WITH BOTTLENECK ARCHITECTURE FOR QUERY-BY-EXAMPLE PROCESSING 审中-公开

公开(公告)号：US20200152179A1

公开(公告)日：2020-05-14

申请号：US16191296

申请日：2018-11-14

Applicant: SRI International

Inventor： Julien van Hout , Vikramjit Mitra , Horacio Franco , Emre Yilmaz

IPC: G10L15/16 , G10L15/22 , G06F17/27 , G06N3/08 , G06N3/04

Abstract: A computing system determines whether a reference audio signal contains a query. A time-frequency convolutional neural network (TFCNN) comprises a time and frequency convolutional layers and a series of additional layers, which include a bottleneck layer. The computation engine applies the TFCNN to samples of a query utterance at least through the bottleneck layer. A query feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the query utterance. The computation engine also applies the TFCNN to samples of the reference audio signal at least through the bottleneck layer. A reference feature vector comprises output values of the bottleneck layer generated when the computation engine applies the TFCNN to the samples of the reference audio signal. The computation engine determines at least one detection score based on the query feature vector and the reference feature vector.

3.

发明公开
SPEECH MODIFICATION USING ACCENT EMBEDDINGS 审中-公开

公开(公告)号：US20240304175A1

公开(公告)日：2024-09-12

申请号：US18599018

申请日：2024-03-07

Applicant: SRI International

Inventor： Alexander Erdmann , Sarah Bakst , Harry Bratt , Dimitra Vergyri , Horacio Franco

IPC: G10L13/047 , G10L15/16

CPC classification number: G10L13/047 , G10L15/16

Abstract: Techniques for a machine learning system configured to obtain a dataset of a plurality of sample speech clips; generate a plurality of sequence; initialize a plurality of speaker embeddings and a plurality of accent embeddings; update the plurality of speaker embeddings; update the plurality of accent embeddings; generate a plurality of augmented embeddings based on the plurality of sequence embeddings, the plurality of speaker embeddings, and the plurality of accent embeddings; and generate a plurality of synthetic speech clips based on the plurality of augmented embeddings. The machine learning system may further be configured to obtain an audio waveform; decompose the audio waveform into first magnitude spectral slices and an original phase; process the first magnitude spectral slices to map the first magnitude spectral slices to second magnitude spectral slices; and generate a modified audio waveform in part by combining the second magnitude spectral slices and the original phase.

Patent Agency Ranking