Invention Grant
- Patent Title: Time-varying and nonlinear audio processing using deep neural networks
-
Application No.: US17924701Application Date: 2020-05-12
-
Publication No.: US12334043B2Publication Date: 2025-06-17
- Inventor: Marco Antonio Martinez Ramirez , Joshua Daniel Reiss , Emmanouil Benetos
- Applicant: WAVESHAPER TECHNOLOGIES INC.
- Applicant Address: CA Montreal
- Assignee: WAVESHAPER TECHNOLOGIES INC.
- Current Assignee: WAVESHAPER TECHNOLOGIES INC.
- Current Assignee Address: CA Montreal
- Agency: FASKEN MARTINEAU DuMOULIN LLP
- Agent Johann Gest; Dennis Haszko
- International Application: PCT/GB2020/051150 WO 20200512
- International Announcement: WO2021/229197 WO 20211118
- Main IPC: G10H1/00
- IPC: G10H1/00 ; G06N3/0442 ; G06N3/045 ; G06N3/0499 ; G06N3/08 ; G10H1/16

Abstract:
A computer-implemented method of processing audio data, the method comprising receiving input audio data (x) comprising a time-series of amplitude values; transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x); transforming the input frequency band decomposition (X1) into a first latent representation (Z); processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z{circumflex over ( )}, Z1{circumflex over ( )}); transforming the second latent representation (Z{circumflex over ( )}, Z1{circumflex over ( )}) to obtain a discrete approximation (X3{circumflex over ( )}); element-wise multiplying the discrete approximation (X3{circumflex over ( )}) and a residual feature map (R, X5{circumflex over ( )}) to obtain a modified feature map, wherein the residual feature map (R, X5{circumflex over ( )}) is derived from the input frequency band decomposition (X1); processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1{circumflex over ( )}, X1.2{circumflex over ( )}), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network; summing the waveshaped frequency band decomposition (X1{circumflex over ( )}, X1.2{circumflex over ( )}) and a modified frequency band decomposition (X2{circumflex over ( )}, X1.1{circumflex over ( )}) to obtain a summation output (X0{circumflex over ( )}), wherein the modified frequency band decomposition (X2{circumflex over ( )}, X1.1{circumflex over ( )}) is derived from the modified feature map; and transforming the summation output (X0{circumflex over ( )}) to obtain target audio data (y{circumflex over ( )}).
Public/Granted literature
- US20230197043A1 TIME-VARYING AND NONLINEAR AUDIO PROCESSING USING DEEP NEURAL NETWORKS Public/Granted day:2023-06-22
Information query