Abstract:
Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for adjusting audio and/or video information of a video clip based at least in part on facial feature and/or voice feature characteristics extracted from hardware components. For example, in response to detecting a request to generate an avatar video clip of a virtual avatar, a video signal associated with a face in a field of view of a camera and an audio signal may be captured. Voice feature characteristics and facial feature characteristics may be extracted from the audio signal and the video signal, respectively. In some examples, in response to detecting a request to preview the avatar video clip, an adjusted audio signal may be generated based at least in part on the facial feature characteristics and the voice feature characteristics, and a preview of the video clip of the virtual avatar using the adjusted audio signal may be displayed.
Abstract:
Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for providing audio and/or video effects based at least in part on facial features and/or voice feature characteristics of the user. For example, video and/or an audio signal of the user may be recorded by a device. Voice audio features and facial feature characteristics may be extracted from the voice audio signal and the video, respectively. The facial features of the user may be used to modify features of a virtual avatar to emulate the facial feature characteristics of the user. The extracted voice audio features may modified to generate an adjusted audio signal or an audio signal may be composed from the voice audio features. The adjusted/composed audio signal may simulate the voice of the virtual avatar. A preview of the modified video/audio may be provided at the user's device.
Abstract:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
Abstract:
Systems and methods for controlling echo in audio communications between a near-end system and a far-end system are described. The system and method may intelligently assign a plurality of microphone beams to a limited number of echo cancellers for processing. The microphone beams may be classified based on generated statistics to determine beams of interest (e.g., beams with a high ratio of local-voice to echo). Based on this ranking/classification of microphone beams, beams of greater interest may be assigned to echo cancellers while less important beams may temporally remain unprocessed until these beams become of higher importance/interest. Accordingly, a limited number of echo cancellers may be used to intelligently process a larger number of microphone beams based on interest in the beams and properties of echo cancellation performed for each beam.
Abstract:
System of improving sound quality includes loudspeaker, microphone, accelerometer, acoustic-echo-cancellers (AEC), and double-talk detector (DTD). Loudspeaker outputs loudspeaker signal including downlink audio signal from far-end speaker. Microphone generates microphone uplink signal and receives at least one of: near-end speaker, ambient noise, and loudspeaker signals. Accelerometer generates accelerometer-uplink signal and receives at least one of: near-end speaker, ambient noise, and loudspeaker signals. First AEC receives downlink audio, microphone-uplink and double talk control signals, and generates AEC-microphone linear echo estimate and corrected AEC-microphone uplink signal. Second AEC receives downlink audio, accelerometer uplink and double talk control signals, and generates AEC-accelerometer linear echo estimate and corrected AEC-accelerometer uplink signal. DTD receives downlink audio signal, uplink signals, corrected uplink signals, linear echo estimates, and generates double-talk control signal. Uplink audio signal including at least one of corrected microphone-uplink signal and corrected accelerometer-uplink signal is generated. Other embodiments are described.
Abstract:
Processing of ambience and speech can include extracting from audio signals, ambience and speech signals. One or more spatial parameters can be generated that define spatial characteristics of ambience sound in the one or more ambience audio signals. The primary speech signal, the one or more ambience audio signals, and the spatial parameters can be encoded into one or more encoded data streams. Other aspects are described and claimed.
Abstract:
Processing of ambience and speech can include extracting from audio signals, ambience and speech signals. One or more spatial parameters can be generated that define spatial characteristics of ambience sound in the one or more ambience audio signals. The primary speech signal, the one or more ambience audio signals, and the spatial parameters can be encoded into one or more encoded data streams. Other aspects are described and claimed.
Abstract:
A method performed by a processor of an audio source device. The method drives an audio output device of the audio source device to output a sound with an audio output signal. The method obtains a microphone signal from a microphone of the audio source device, the microphone signal capturing the outputted sound. The method determines whether the audio output device is a headset or a loudspeaker based on the microphone signal and configures an acoustic dosimetry process based on the determination.
Abstract:
An audio system includes one or more loudspeaker cabinets, each having loudspeakers. Sensing logic determines an acoustic environment of the loudspeaker cabinets. The sensing logic may include an echo canceller. A low frequency filter corrects an audio program based on the acoustic environment of the loudspeaker cabinets. The system outputs an omnidirectional sound pattern, which may be low frequency sound, to determine the acoustic environment. The system may produce a directional pattern superimposed on an omnidirectional pattern, if the acoustic environment is in free space. The system may aim ambient content toward a wall and direct content away from the wall, if the acoustic environment is not in free space. The sensing logic automatically determines the acoustic environment upon initial power up and when position changes of loudspeaker cabinets are detected. Accelerometers may detect position changes of the loudspeaker cabinets.
Abstract:
An audio system includes one or more loudspeaker cabinets, each having loudspeakers. Sensing logic determines an acoustic environment of the loudspeaker cabinets. The sensing logic may include an echo canceller. A low frequency filter corrects an audio program based on the acoustic environment of the loudspeaker cabinets. The system outputs an omnidirectional sound pattern, which may be low frequency sound, to determine the acoustic environment. The system may produce a directional pattern superimposed on an omnidirectional pattern, if the acoustic environment is in free space. The system may aim ambient content toward a wall and direct content away from the wall, if the acoustic environment is not in free space. The sensing logic automatically determines the acoustic environment upon initial power up and when position changes of loudspeaker cabinets are detected. Accelerometers may detect position changes of the loudspeaker cabinets.