DISTRIBUTED MULTI-DEVICE AUDIO CAPTURE IN A SHARED ACOUSTIC ENVIRONMENT

    公开(公告)号:US20240357285A1

    公开(公告)日:2024-10-24

    申请号:US18757676

    申请日:2024-06-28

    CPC classification number: H04R3/005 H04R29/005 H04M3/568

    Abstract: Techniques are provided herein for auto-muting procedures that result in efficient high-quality audio capture in a multi-device environment. In particular, when there are multiple computing devices in a shared meeting room, the microphone with the highest rated audio input is selected for the teleconference audio input from the shared environment. Each computing device connected to the teleconference from the meeting room determines a score for its microphone signal. The score is shared with the other devices in the room, and the microphone signal with the highest score is transmitted to the conference. Host-based systems include a host device receiving and reviewing the scores and determining which microphones to auto-mute. Other distributed systems include each computing device transmitting its score to the other devices and receiving the scores from the other devices, and each device determining whether to auto-mute.

    Real-time dynamic noise reduction using convolutional networks

    公开(公告)号:US12062369B2

    公开(公告)日:2024-08-13

    申请号:US17033605

    申请日:2020-09-25

    CPC classification number: G10L15/20 G10L15/16

    Abstract: A system, method and computer readable medium for dynamic noise reduction in a voice call. The system includes an encoder having a short-time Fourier transform module to determine a magnitude spectrum and a phase spectrum of an input audio signal, including speech and dynamic noise. A separator coupled to the encoder comprises a temporal convolution network (TCN) used to develop a separation mask using the magnitude spectrum as input. The TCN is trained using a frequency SNR function used to calculate loss during training. A mixer is coupled to the separator to multiply the separation mask with the magnitude spectrum to separate the speech from the dynamic noise to obtain a denoise magnitude spectrum. A decoder coupled to the mixer and the encoder includes an inverse short-time Fourier transform module to reconstruct the input audio signal without the dynamic noise using the denoise magnitude spectrum and the phase spectrum.

    Neural network based time-frequency mask estimation and beamforming for speech pre-processing

    公开(公告)号:US10573301B2

    公开(公告)日:2020-02-25

    申请号:US16023455

    申请日:2018-06-29

    Abstract: Techniques are provided for pre-processing enhancement of a speech signal. A methodology implementing the techniques according to an embodiment includes performing de-reverberation processing on signals received from an array of microphones, the signals comprising speech and noise. The method also includes generating time-frequency masks (TFMs) for each of the signals. The TFMs indicate the probability that a time-frequency component of the signal associated with that TFM element includes speech. The TFM generation is based on application of a recurrent neural network to the signals. The method further includes generating steering vectors based on speech covariance matrices and noise covariance matrices. The TFMs are employed to filter speech components of the signals, for calculation of the speech covariance, and noise components of the signals for calculation of the noise covariance. The method further includes performing beamforming on the signals, based on the steering vectors, to generate the enhanced speech signal.

    Simultaneous multi-user audio signal recognition and processing for far field audio

    公开(公告)号:US10438588B2

    公开(公告)日:2019-10-08

    申请号:US15702490

    申请日:2017-09-12

    Abstract: A mechanism is described for facilitating simultaneous recognition and processing of multiple speeches from multiple users according to one embodiment. A method of embodiments, as described herein, includes facilitating a first microphone to detect a first speech from a first speaker, and a second microphone to detect a second speech from a second speaker. The method may further include facilitating a first beam-former to receive and process the first speech, and a second beam-former to receive and process the second speech, where the first and second speeches are at least received or processed simultaneously. The method may further include communicating a first output associated with the first speech and a second output associated with the second speech to the first speaker and the second speaker, respectively, using at least one of one or more speaker devices and one or more display devices.

    Audio gait detection and identification

    公开(公告)号:US10170135B1

    公开(公告)日:2019-01-01

    申请号:US15858849

    申请日:2017-12-29

    Abstract: Systems, apparatuses and methods for technology to perform gait detection and identification. The system includes a pre-processing pipeline to process audio input data from one or more microphones to combine and strengthen an audio gait signal. The pre-processing pipeline is coupled to a gait detector to detect the sound of one or more footsteps from the audio gait signal. The system also includes a person evaluator (e.g., identifier/verifier) to identify the person associated with the one or more footsteps using a set of trained footstep identification (ID) classifiers. Each trained footstep ID classifier is mapped to the gait of a specific person in the home based on a particular combination of person, footwear, and floor surface within the home.

    SECURE REAL TIME VOICE ANONYMIZATION AND RECOVERY

    公开(公告)号:US20250124171A1

    公开(公告)日:2025-04-17

    申请号:US18999422

    申请日:2024-12-23

    Abstract: Voice anonymization systems and methods are provided. Voice anonymization is done on the speaker's computing device and can prevent voice theft. The voice anonymization systems and methods are lightweight and run efficiently in real time on a computing device, allowing for speaker anonymity without diminishing system performance during a teleconference or VoIP meeting. The anonymization system outputs a transformed speaker voice. The anonymization system can also generate a voice embedding that can be used to reconstruct the original speaker voice. The voice embedding can be encrypted and transmitted to another device. Sometimes, the voice embedding is not transmitted and the listener receives the anonymized voice. Systems and methods are provided for the detection of voice transformations in received audio. Thus, a listener can be informed whether the speaker voice output from the listener's computing device is the original speaker's voice or a transformed version of the original speaker voice.

    METHODS AND APPARATUS TO MODEL SPEAKER AUDIO
    20.
    发明公开

    公开(公告)号:US20240331705A1

    公开(公告)日:2024-10-03

    申请号:US18194248

    申请日:2023-03-31

    CPC classification number: G10L17/04

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed. An example apparatus includes: interface circuitry; instructions; and programmable circuitry to at least one of execute or instantiate the instructions to: calculate a sample embedding vector that characterizes a speaker based on a first audio signal; perform a first update of a personal embedding vector based on the sample embedding vector, the updated personal embedding vector to characterize the speaker based on a second audio signal and the first audio signal, and perform a second update of the personal embedding vector based on the first update and a universal embedding vector.

Patent Agency Ranking