Abstract:
System, method and/or computer readable medium for audio visual sound source Separation that applies cross-modal meta consistency learning. Inputs include an audio spectrogram that represents first sounds second sounds, a first video of a first sound producing object, and a second video of a second sound producing object. Audio features and audio tokens are generated by applying audio encoders to the audio spectrogram. Visual tokens are generated by applying a visual encoder to the first video and the second video. Respective audio-visual features are obtained based on combining the audio token with the respective visual tokens. Based on the respective audio-visual features, first and second separated audio masks are generated by applying a decoder.
Abstract:
A method, device and computer-readable medium for generating a super-resolution version of a compressed video stream. By leveraging the motion information and residual information in compressed video streams, described examples are able to skip the time-consuming motion-estimation step for most frames and make the most use of the SR results of key frames. A key frame SR module generates SR versions of I-frames and other key frames of a compressed video stream using techniques similar to existing multi-frame approaches to VSR. A non-key frame SR module generates SR version of the non-key inter frames between these key frames by making use of motion information and residual information used to encode the inter frames in the compressed video stream.
Abstract:
Methods, devices and computer-readable media for processing a compressed video to perform an inference task are disclosed. Processing the compressed video may include selecting a subset of frame encodings of the compressed video, or zero or more modalities (RGB, motion vectors, residuals) of a frame encoding, for further processing to perform the inference task. Pre-existing motion vector and/or residual information in frame encodings of the compressed video are leveraged to adaptively and efficiently perform the inference task. In some embodiments, the inference task is an action recognition task, such as a human action recognition task.
Abstract:
Systems and methods for multi-frame video frame interpolation. Higher-order motion modeling, such as cubic motion modeling, achieves predictions of intermediate optical flow between multiple interpolated frames, assisted by relaxation of the constraints imposed by the loss function used in initial optical flow estimation. A temporal pyramidal optical flow refinement module performs coarse-to-fine refinement of the optical flow maps used to generate the intermediate frames, focusing a proportionally greater amount of refinement attention to the optical flow maps for the high-error middle frames. A temporal pyramidal pixel refinement module performs coarse-to-fine refinement of the generated intermediate frames, focusing a proportionally greater amount of refinement attention to the high-error middle frames. A generative adversarial network (GAN) module calculates a loss function for training the neural networks used in the optical flow estimation module, temporal pyramidal optical flow refinement module, and/or temporal pyramidal pixel refinement module.
Abstract:
Systems and methods for multi-frame video frame interpolation. Higher-order motion modeling, such as cubic motion modeling, achieves predictions of intermediate optical flow between multiple interpolated frames, assisted by relaxation of the constraints imposed by the loss function used in initial optical flow estimation. A temporal pyramidal optical flow refinement module performs coarse-to-fine refinement of the optical flow maps used to generate the intermediate frames, focusing a proportionally greater amount of refinement attention to the optical flow maps for the high-error middle frames. A temporal pyramidal pixel refinement module performs coarse-to-fine refinement of the generated intermediate frames, focusing a proportionally greater amount of refinement attention to the high-error middle frames. A generative adversarial network (GAN) module calculates a loss function for training the neural networks used in the optical flow estimation module, temporal pyramidal optical flow refinement module, and/or temporal pyramidal pixel refinement module.
Abstract:
A detailed identification method for articles has an image-taking step and a comparison step. The image-taking step includes acts of using a portable identification device having a close-up lens and a screen to take a detailed image of an undistinguishable article and displaying the detailed image on the screen with an enlarged size. The comparison step includes acts of recalling a detailed image of a legal article from a database and displaying the detailed image on the screen of the portable identification device with an enlarged size. The two detailed images are displayed on the same screen so a user can clearly compare the detailed images to judge whether the undistinguishable article is a fake. The detailed identification method for articles has mobility and can be conveniently and extensively used.
Abstract:
The present invention relates to a method for positioning a mobile station and a repeater thereof. Said method comprises: the mobile communication network, when receiving a positioning request from a mobile station, instructing repeaters to send auxiliary positioning signals, and the mobile station performs measurement according to the received auxiliary positioning signals sent from the repeaters and downlink signals sent from the base station, and then estimating the position of the mobile station according to the measurement results, and thereby implementing the positioning of the mobile station. Said repeater is implemented through adding an auxiliary positioning unit in the downlink processing channel of a traditional repeater; said auxiliary positioning unit comprises a communication module, a frame timing recovery module, a timing control module, and a pilot modulating module. The present invention achieves the object of improving the accuracy of positioning the mobile station within the coverage area of repeaters; in addition, the repeater with auxiliary positioning function according to the present invention is simple to be implemented and doesn't affect the structure and signaling flow of the traditional mobile station.
Abstract:
The invention provides a novel calcium-independent cytosolic phospholipase A2/B enzyme, polynucleotides encoding such enzyme antibodies to such enzyme, and methods for screening unknown compounds for anti-inflammatory activity mediated by the arachidonic acid cascade.
Abstract:
The invention provides a novel calcium-independent cytosolic phospholipase A.sub.2 /B enzyme, polynucleotides encoding such enzyme and methods for screening unknown compounds for anti-inflammatory activity mediated by the arachidonic acid cascade.
Abstract:
Systems, methods and computer-readable medium for predicting a depth for a video frame are disclosed. An example method may include steps of: receiving a plurality of training data, each comprising a set of consecutive video frames and a depth representation of a subsequent video frame to the consecutive video frames; receiving a pre-trained neural network model fθ having a plurality of weights θ; while the pre-trained neural network model fθ has not converged: computing a plurality of second weights θi′, based on each set of consecutive video frames, and updating the plurality of weights θ, based on the plurality of training data and the plurality of second weights θi′; receiving a plurality of new consecutive video frames with consecutive timestamps; and predicting a depth representation of video frame immediately subsequent to the new consecutive video frames based on the updated plurality of weights θ.