Time-varying and nonlinear audio processing using deep neural networks

Invention Grant

US12334043B2 Time-varying and nonlinear audio processing using deep neural networks 有权

Please log in to see more content

Patent Title: Time-varying and nonlinear audio processing using deep neural networks
Application No.: US17924701

Application Date: 2020-05-12
Publication No.: US12334043B2

Publication Date: 2025-06-17
Inventor: Marco Antonio Martinez Ramirez , Joshua Daniel Reiss , Emmanouil Benetos
Applicant: WAVESHAPER TECHNOLOGIES INC.
Applicant Address: CA Montreal
Assignee: WAVESHAPER TECHNOLOGIES INC.
Current Assignee: WAVESHAPER TECHNOLOGIES INC.
Current Assignee Address: CA Montreal
Agency: FASKEN MARTINEAU DuMOULIN LLP
Agent Johann Gest; Dennis Haszko
International Application: PCT/GB2020/051150 WO 20200512
International Announcement: WO2021/229197 WO 20211118
Main IPC: G10H1/00
IPC: G10H1/00 ; G06N3/0442 ; G06N3/045 ; G06N3/0499 ; G06N3/08 ; G10H1/16

Time-varying and nonlinear audio processing using deep neural networks

Abstract:

A computer-implemented method of processing audio data, the method comprising receiving input audio data (x) comprising a time-series of amplitude values; transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x); transforming the input frequency band decomposition (X1) into a first latent representation (Z); processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z{circumflex over ( )}, Z1{circumflex over ( )}); transforming the second latent representation (Z{circumflex over ( )}, Z1{circumflex over ( )}) to obtain a discrete approximation (X3{circumflex over ( )}); element-wise multiplying the discrete approximation (X3{circumflex over ( )}) and a residual feature map (R, X5{circumflex over ( )}) to obtain a modified feature map, wherein the residual feature map (R, X5{circumflex over ( )}) is derived from the input frequency band decomposition (X1); processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1{circumflex over ( )}, X1.2{circumflex over ( )}), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network; summing the waveshaped frequency band decomposition (X1{circumflex over ( )}, X1.2{circumflex over ( )}) and a modified frequency band decomposition (X2{circumflex over ( )}, X1.1{circumflex over ( )}) to obtain a summation output (X0{circumflex over ( )}), wherein the modified frequency band decomposition (X2{circumflex over ( )}, X1.1{circumflex over ( )}) is derived from the modified feature map; and transforming the summation output (X0{circumflex over ( )}) to obtain target audio data (y{circumflex over ( )}).

Public/Granted literature

US20230197043A1 TIME-VARYING AND NONLINEAR AUDIO PROCESSING USING DEEP NEURAL NETWORKS Public/Granted day:2023-06-22

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10H	电声乐器；由机电装置或电子发生器产生音调的乐器，或从数据存储器合成音调的乐器
G10H1/00	电声乐器的零部件（也可适用于其他乐器的键盘入G10B，G10C；用于产生混响或回声的装置入G10K15/08）