Abstract:
Systems and methods are described for storing and reusing previously generated/calculated acoustic environment data. By reusing acoustic environment data, the systems and methods described herein may avoid the increased overhead in generating/calculating acoustic environment data for a location when this data has already been generated and is likely accurate. In particular, the time and complexity involved in determining reverberation/echo levels, noise levels, and noise types may be avoided when this information is available in storage. This previously stored acoustic environment data may not be limited to data generated/calculated by the same audio device. Instead, in some embodiments an audio device may access a centralized repository to leverage acoustic environment data generated/calculated by other audio devices.
Abstract:
A method for adapting a threshold used in multi-channel audio voice activity detection. Strengths of primary and secondary sound pick up channels are computed. A separation, being a measure of difference between the strengths of the primary and secondary channels, is also computed. An analysis of the peaks in separation is performed, e.g. using a leaky peak capture function that captures a peak in the separation and then decays over time, or using a sliding window min-max detector. A threshold that is to be used in a voice activity detection (VAD) process is adjusted, in accordance with the analysis of the peaks. Other embodiments are also described and claimed.
Abstract:
A method for controlling a speech enhancement process in a far-end device, while engaged in a voice or video telephony communication session over a communication link with a near-end device. A near-end user speech signal is produced, using a microphone to pick up speech of a near-end user, and is analyzed by an automatic speech recognizer (ASR) without being triggered by an ASR trigger phrase or button. The recognized words are compared to a library of phrases to select a matching phrase, where each phrase is associated with a message that represents an audio signal processing operation. The message associated with the matching phrase is sent to the far-end device, which is used to configure the far-end device to adjust the speech enhancement process that produces the far-end speech signal. Other embodiments are also described.
Abstract:
Signals are received from audio pickup channels that contain signals from multiple sound sources. The audio pickup channels may include one or more microphones and one or more accelerometers. Signals representative of multiple sound sources are generated using a blind source separation algorithm. It is then determined which of those signals is deemed to be a voice signal and which is deemed to be a noise signal. The output noise signal may be scaled to match a level of the output voice signal, and a clean speech signal is generated based on the output voice signal and the scaled noise signal. Other aspects are described.
Abstract:
Signals are received from audio pickup channels that contain signals from multiple sound sources. The audio pickup channels may include one or more microphones and one or more accelerometers. Signals representative of multiple sound sources are generated using a blind source separation algorithm. It is then determined which of those signals is deemed to be a voice signal and which is deemed to be a noise signal. The output noise signal may be scaled to match a level of the output voice signal, and a clean speech signal is generated based on the output voice signal and the scaled noise signal. Other aspects are described.
Abstract:
Method of speech enhancement using Neural Network-based combined signal starts with training neural network offline which includes: (i) exciting at least one accelerometer and at least one microphone using training accelerometer signal and training acoustic signal, respectively. The training accelerometer signal and the training acoustic signal are correlated during clean speech segments. Training neural network offline further includes(ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.
Abstract:
Method of speech enhancement using Neural Network-based combined signal starts with training neural network offline which includes: (i) exciting at least one accelerometer and at least one microphone using training accelerometer signal and training acoustic signal, respectively. The training accelerometer signal and the training acoustic signal are correlated during clean speech segments. Training neural network offline further includes (ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.
Abstract:
Digital signal processing techniques for automatically reducing audible noise from a sound recording that contains speech. A noise suppression system uses two types of noise estimators, including a more aggressive one and less aggressive one. Decisions are made on how to select or combine their outputs into a usable noise estimate in a different speech and noise conditions. A 2-channel noise estimator is described. Other embodiments are also described and claimed.
Abstract:
Aspects of the subject technology provide for generation of a self-voice signal by an electronic device that is operating in an active noise cancellation mode. In this way, during a phone call, a video conference, or while listening to audio content, a user of the electronic device may benefit from active cancellation of ambient noise while still being able to hear their own voice when they speak. In various implementations described herein, the concurrent self-voice and automatic noise cancellation features are facilitated by accelerometer-based control of sidetone and/or active noise cancellation operations.
Abstract:
Aspects of the subject technology provide for generation of a self-voice signal by an electronic device that is operating in an active noise cancellation mode. In this way, during a phone call, a video conference, or while listening to audio content, a user of the electronic device may benefit from active cancellation of ambient noise while still being able to hear their own voice when they speak. In various implementations described herein, the concurrent self-voice and automatic noise cancellation features are facilitated by accelerometer-based control of sidetone and/or active noise cancellation operations.