Z-vectors: speaker embeddings from raw audio using sincnet, extended CNN architecture and in-network augmentation techniques

Invention Grant

US11715460B2 Z-vectors: speaker embeddings from raw audio using sincnet, extended CNN architecture and in-network augmentation techniques 有权

Please log in to see more content

Patent Title: Z-vectors: speaker embeddings from raw audio using sincnet, extended CNN architecture and in-network augmentation techniques
Application No.: US17066210

Application Date: 2020-10-08
Publication No.: US11715460B2

Publication Date: 2023-08-01
Inventor: Elie Khoury , Ganesh Sivaraman , Tianxiang Chen , Amruta Vidwans
Applicant: PINDROP SECURITY, INC.
Applicant Address: US GA Atlanta
Assignee: PINDROP SECURITY, INC.
Current Assignee: PINDROP SECURITY, INC.
Current Assignee Address: US GA Atlanta
Agency: Foley & Lardner LLP
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L25/51 ; G10L17/04 ; G10L15/06

Z-vectors: speaker embeddings from raw audio using sincnet, extended CNN architecture and in-network augmentation techniques

Abstract:

Described herein are systems and methods for improved audio analysis using a computer-executed neural network having one or more in-network data augmentation layers. The systems described herein help ease or avoid unwanted strain on computing resources by employing the data augmentation techniques within the layers of the neural network. The in-network data augmentation layers will produce various types of simulated audio data when the computer applies the neural network on an inputted audio signal during a training phase, enrollment phase, and/or testing phase. Subsequent layers of the neural network (e.g., convolutional layer, pooling layer, data augmentation layer) ingest the simulated audio data and the inputted audio signal and perform various operations.

Public/Granted literature

US20210110813A1 Z-VECTORS: SPEAKER EMBEDDINGS FROM RAW AUDIO USING SINCNET, EXTENDED CNN ARCHITECTURE AND IN-NETWORK AUGMENTATION TECHNIQUES Public/Granted day:2021-04-15

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络