Classifying audio scene using synthetic image features
Abstract:
A computing system includes an encoder that receives an input image and encodes the input image into real image features, a decoder that decodes the real image features into a reconstructed image, a generator that receives first audio data corresponding to the input image and generates first synthetic image features from the first audio data, and receives second audio data and generates second synthetic image features from the second audio data, a discriminator that receives both the real and synthetic image features and determines whether a target feature is real or synthetic, and a classifier that classifies a scene of the second audio data based on the second synthetic image features.
Information query
Patent Agency Ranking
0/0