Abstract:
Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a prede-termined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.
Abstract:
Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a predetermined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.
Abstract:
In a method of synthesizing voiced speech from pitch prototype waveforms by time-synchronous waveform interpolation (TSWI), one or more pitch prototypes is extracted from a speech signal or a residue signal. The extraction process is performed in such a way that the prototype has minimum energy at the boundary. Each prototype is circularly shifted so as to be time-synchronous with the original signal. A linear phase shift is applied to each extracted prototype relative to the previously extracted prototype so as to maximize the cross-correlation between successive extracted prototypes. A two-dimensional prototype-evolving surface is constructed by unsampling the prototypes to every sample point. The two-dimensional prototype-evolving surface is re-sampled to generate a one-dimensional, synthesized signal frame with sample points defined by piecewise continuous cubic phase contour functions computed from the pitch lags and the phase shifts added to the extracted prototypes. A pre-selection filter may be applied to determine whether to abandon the TSWI technique in favor of another algorithm for the current frame. A post-selection performance measure may be obtained and compared with a predetermined threshold to determine whether the TSWI algorithm is performing adequately.
Abstract:
A mechanism is provided that monitors secondary microphone signals, in a multi-microphone mobile device, to warn the user if one or more secondary microphones are covered while the mobile device is in use. In one example, smoothly averaged power estimates of the secondary microphones may be computed and compared against the noise floor estimate of a primary microphone. Microphone covering detection may be made by comparing the secondary microphone smooth power estimates to the noise floor estimate for the primary microphone. In another example, the noise floor estimates for the primary and secondary microphone signals may be compared to the difference in the sensitivity of the first and second microphones to determine if the secondary microphone is covered. Once detection is made, a warning signal may be generated and issued to the user.
Abstract:
In accordance with a method for providing a distinct perceptual location for an audio source within an audio mixture, a foreground signal may be processed to provide a foreground perceptual angle for the foreground signal. The foreground signal may also be processed to provide a desired attenuation level for the foreground signal. A background signal may be processed to provide a background perceptual angle for the background signal. The background signal may also be processed to provide a desired attenuation level for the background signal. The foreground signal and the background signal may be combined into an output audio source.
Abstract:
Power savings in a mobile device is accomplished by generating audio samples by decoding a bitstream with a decoding system within the mobile device. The generated audio samples are transferred into at least one memory bank in a set of memory banks in a power saver block within the mobile device. Parts of the decoding system not involved in the storing of the generated audio samples are switched off after batch decoding a bitstream associated with multiple audio frames. The bitstream includes bits less than that found in one audio file. At least one of the memory banks in the set of memory banks is power collapsible. The fetching of the decoded by the decoding system can be synchronized with a paging channel of a modem in the mobile device. The transferred audio samples is a lossless compression and may occur after a re-encoding.
Abstract:
In general, this disclosure describes techniques for changing a sampling frequency of a digital signal. In particular, the techniques provide a more accurate way to determining a relative timing between a desired output sample and a corresponding input sample using a non-approximated integer representation of the relative timing. The relative timing between the desired output sample and corresponding input sample may be represented using a first component that identifies a latest input sample of the digital signal used to generate intermediate samples, a second component that identifies an intermediate sample, and a third component that identifies a timing difference between the desired output sample and the intermediate sample. Each of the components may be recursively updated using non-approximated integer values.