Robust speaker localization in presence of strong noise interference systems and methods

    公开(公告)号:US11264017B2

    公开(公告)日:2022-03-01

    申请号:US16900790

    申请日:2020-06-12

    Abstract: Systems and methods include a plurality of audio input components configured to generate a plurality of audio input signals, and a logic device configured to receive the plurality of audio input signals, determine whether the plurality of audio signals comprise target audio associated with an audio source, estimate a relative location of the audio source with respect to the plurality of audio input components based on the plurality of audio signals and a determination of whether the plurality of audio signals comprise the target audio, and process the plurality of audio signals to generate an audio output signal by enhancing the target audio based on the estimated relative location. The logic device is further configured to use relative transfer-based covariance to construct directional covariance matrix aligned across frequency bands and find a direction that minimizes beam power subject to distortionless criteria.

    MULTIPLE-SOURCE TRACKING AND VOICE ACTIVITY DETECTIONS FOR PLANAR MICROPHONE ARRAYS

    公开(公告)号:US20210219053A1

    公开(公告)日:2021-07-15

    申请号:US16740297

    申请日:2020-01-10

    Abstract: Embodiments described herein provide a combined multi-source time difference of arrival (TDOA) tracking and voice activity detection (VAD) mechanism that is applicable for generic array geometries, e.g., a microphone array that lies on a plane. The combined multi-source TDOA tracking and VAD mechanism scans the azimuth and elevation angles of the microphone array in microphone pairs, based on which a planar locus of physically admissible TDOAs can be formed in the multi-dimensional TDOA space of multiple microphone pairs. In this way, the multi-dimensional TDOA tracking reduces the number of calculations that was usually involved in traditional TDOA by performing the TDOA search for each dimension separately.

    Connectionist temporal classification using segmented labeled sequence data

    公开(公告)号:US10762427B2

    公开(公告)日:2020-09-01

    申请号:US15909930

    申请日:2018-03-01

    Abstract: Classification training systems and methods include a neural network for classification of input data, a training dataset providing segmented labeled training data, and a classification training module operable to train the neural network using the training data. A forward pass processing module is operable to generate neural network outputs for the training data using weights and bias for the neural network, and a backward pass processing module is operable to update the weights and biases in a backward pass, including obtaining Region of Target (ROT) information from the training data, generate a forward-backward masking based on the ROT information, the forward-backward masking placing at least one restriction on a neural network output path, compute modified forward and backward variables based on the neural network outputs and the forward-backward masking, and update the weights and biases.

    ADAPTIVE SPATIAL VAD AND TIME-FREQUENCY MASK ESTIMATION FOR HIGHLY NON-STATIONARY NOISE SOURCES

    公开(公告)号:US20200219530A1

    公开(公告)日:2020-07-09

    申请号:US16735575

    申请日:2020-01-06

    Abstract: Systems and methods include a first voice activity detector operable to detect speech in a frame of a multichannel audio input signal and output a speech determination, a constrained minimum variance adaptive filter operable to receive the multichannel audio input signal and the speech determination and minimize a signal variance at the output of the filter, thereby producing an equalized target speech signal, a mask estimator operable to receive the equalized target speech signal and the speech determination and generate a spectral-temporal mask to discriminate a target speech from noise and interference speech, and a second activity voice detector operable to detect voice in a frame of the speech discriminated signal. An audio input sensor array including a plurality of microphones, each microphone generating a channel of the multichannel audio input signal. A sub-band analysis module operable to decompose each of the channels into a plurality of frequency sub-bands.

    Audio enhancement through supervised latent variable representation of target speech and noise

    公开(公告)号:US11763832B2

    公开(公告)日:2023-09-19

    申请号:US16865111

    申请日:2020-05-01

    CPC classification number: G10L21/0264 G06N3/08 G10L21/0216

    Abstract: Systems and methods for generating an enhanced audio signal comprise a trained neural network configured to receive an input audio signal and generate an enhanced target signal, the trained neural network comprising a pre-processing neural network configured to receive a segment of the input audio signal and output an audio classification, the pre-processing neural network including at least one hidden layer comprising an embedding vector, and a noise reduction neural network configured to receive the segment of the input audio signal, and the embedding vector and generate the enhanced target signal. The pre-processing neural network may comprise a target signal pre-processing neural network configured to output a target signal classification and comprising at least one hidden layer comprising a target embedding vector. The pre-processing neural network may comprise a noise pre-processing neural network configured output a noise classification and comprising at least one hidden layer comprising a noise embedding vector.

    MULTIPLE-SOURCE TRACKING AND VOICE ACTIVITY DETECTIONS FOR PLANAR MICROPHONE ARRAYS

    公开(公告)号:US20210314701A1

    公开(公告)日:2021-10-07

    申请号:US17349589

    申请日:2021-06-16

    Abstract: Embodiments described herein provide a combined multi-source time difference of arrival (TDOA) tracking and voice activity detection (VAD) mechanism that is applicable for generic array geometries, e.g., a microphone array that lies on a plane. The combined multi-source TDOA tracking and VAD mechanism scans the azimuth and elevation angles of the microphone array in microphone pairs, based on which a planar locus of physically admissible TDOAs can be formed in the multi-dimensional TDOA space of multiple microphone pairs. In this way, the multi-dimensional TDOA tracking reduces the number of calculations that was usually involved in traditional TDOA by performing the TDOA search for each dimension separately.

    Efficient connectionist temporal classification for binary classification

    公开(公告)号:US10762417B2

    公开(公告)日:2020-09-01

    申请号:US15894872

    申请日:2018-02-12

    Abstract: A classification system and method for training a neural network includes receiving a stream of segmented, labeled training data having a sequence of frames, computing a stream of input features data for the sequence of frames, and generating neural network outputs for the sequence of frames in a forward pass through the training data and in accordance weights and biases. The weights and biases are updated in a backward pass through the training data, including determining Region of Target (ROT) information from the segmented, labeled training data, computing modified forward and backward variables based on the neural network outputs and the ROT information, deriving a signal error for each frame within the sequence of frames based on the modified forward and backward variables, and updating the weights and biases based on the derived signal error. An adaptive learning module is provided to improve a convergence rate of the neural network.

Patent Agency Ranking