Abstract:
A method for processing an input signal having an audio component is described. The method includes obtaining a set of time parameters from a time frequency transformation of the audio component of the input signal, the audio component being a mixture of audio signals comprising at least one first audio signal of a first audio source; determining at least one motion feature of the first audio source from a visual sequence corresponding to the first audio signal; obtaining a weight vector of the set of time parameters based on the motion feature; and determining a time frequency transformation of the first audio signal based on the weight vector.
Abstract:
A method and a system (20) of audio source separation are described. The method comprises: receiving (10) an audio mixture and at least one text query associated to the audio mixture; retrieving (11) at least one audio sample from an auxiliary audio database; evaluating (12) the retrieved audio samples; and separating (13) the audio mixture into a plurality of audio sources using the audio samples. The corresponding system (20) comprises a receiving (21) and a processor (22) configured to implement the method.
Abstract:
A method for encoding multiple audio signals comprises random sampling and quantizing each of the multiple audio signals, and encoding the sampled and quantized multiple audio signals as side information that can be used for decoding and separating the multiple audio signals from a mixture of said multiple audio signals. A method for decoding a mixture of multiple audio signals comprises decoding and demultiplexing side information, the side information comprising quantized samples of each of the multiple audio signals, receiving or retrieving from any data source a mixture of said multiple audio signals, and generating multiple estimated audio signals that approximate said multiple audio signals, wherein said quantized samples of each of the multiple audio signals are used.
Abstract:
Separation of a singing voice source from an audio mixture by using auxiliary information related to temporal activity of the different audio sources to improve the separation process. An audio signal is produced from symbolic digital musical score and symbolic digital lyrics information related to a singing voice in the audio mixture. By means of Non-negative Matrix Factorization (NMF), characteristics of the audio mixture and of the produced audio signal are used to produce an estimated singing voice and an estimated accompaniment through Wiener filtering.
Abstract:
An apparatus and method for generating visual content from an audio signal are described. The method includes receiving (310) audio content, processing (320) the audio content to separate into a first and second portion of the audio content, converting (330) the second portion into visual content, delaying (340) the first portion based on a time relationship between the audio content and the visual content, the delaying accounting for time to process the first portion and convert the second portion, and providing (350) the visual content and audio content for reproduction. The apparatus includes a source separation module (210) processing the received audio content to separate into a first and second portion of the audio content, a converter module (220) converting the second portion into visual content, and a synchronization module (230) delaying the first portion based on a time relationship between the audio content and the visual content.
Abstract:
Methods and apparatus for generating a fingerprint of an audio signal are disclosed. The method comprises: detecting peaks in a representation of a temporal spectrum of frequencies of the audio signal, a peak being defined as a point in the representation which has a higher energy than its neighboring points; and generating the fingerprint of the audio signal as a function of a distribution of positions of the detected peaks along a frequency axis and a distribution of positions of the detected peaks along a time axis. The fingerprint of the disclosure is not only robust to many types of noise, but also robust against time scale modification and frequency shifting.
Abstract:
A particular implementation determines parameters of a generative probabilistic model from visual descriptors extracted from at least one image. The extracted visual descriptors are quantized and encoded using the model-based arithmetic encoding to be stored or for transmission to a decoder. The model parameters are also stored to be available to a decoder, or transmitted directly to a decoder. A decoder uses the stored, or received, model parameters to reconstruct the generative probabilistic model and then to decode the visual descriptors. The visual descriptors are used for image analysis tasks, such as image retrieval or object detection. A particular implementation uses a Gaussian mixture model as a generative probabilistic model.
Abstract:
A solution for delivery of audiovisual content to a receiver device is provided. At the transmitter side, a transmission buffer is constituted, while offering fast channel change and fast trick modes at the receiver side. At least one GoP, starting with a first I-frame is sought in the content that is to be transmitted. The timing references of the data in the at least one GoP that is prepared for delivery to a receiver device are modified so that the data is decoded by the receiver at a slowed-down rate for a given duration. This creates a lag between reading of data in by the transmitter and decoding of data by the receiver. The lag is used by the transmitter to fill the transmission buffer, while the receiver does not have to wait for the transmission buffer to be filled to start decoding.
Abstract:
A method is proposed for encoding at least two signals. The method includes mixing the at least two signals in a mixture; sampling a map Z representative of locations of the at least two signals in a time-frequency plane at sampling locations, the sampling delivering a first list of values ZΩ; and transmitting the mixture of the at least two signals and information representative of the first list of values ZΩ. The disclosure also relates to the corresponding method for separating signals in a mixture, and corresponding computer program products, devices and bitstream.
Abstract:
Separation of speech and background from an audio mixture by using a speech example, generated from a source associated with a speech component in the audio mixture, to guide the separation process.