Abstract:
Method and apparatus for deriving a motion vector at a video decoder. A block-based motion vector may be produced at the video decoder by utilizing motion estimation among available pixels relative to blocks in one or more reference frames. The available pixels could be, for example, spatially neighboring blocks in the sequential scan coding order of a current frame, blocks in a previously decoded frame, or blocks in a downsampled frame in a lower pyramid when layered coding has been used.
Abstract:
Methods and systems to apply motion estimation (ME) based on reconstructed reference pictures in a B frame or in a P frame at a video decoder. For a P frame, projective ME may be performed to obtain a motion vector (MV) for a current input block. In a B frame, both projective ME and mirror ME may be performed to obtain an MV for the current input block. The ME process can be performed on sub-partitions of the input block, which may reduce the prediction error without increasing the amount of MV information in the bitstream. Decoder-side ME can be applied for the prediction of existing inter frame coding modes, and traditional ME or the decoder-side ME can be adaptively selected to predict a coding mode based on a rate distribution optimization (RDO) criterion.
Abstract:
Systems, devices and methods are described including performing scalable video coding using inter-layer pixel sample prediction. Inter-layer pixel sample prediction in an enhancement layer coding unit, prediction unit, or transform unit may use reconstructed pixel samples obtained from a base layer or from a lower enhancement layer. The pixel samples may be subjected to upsample filtering and/or refinement filtering. The upsample or refinement filter coefficients may be predetermined or may be adaptively determined.
Abstract:
Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.
Abstract:
Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
Abstract:
An embodiment of a semiconductor package apparatus may include technology to determine a residual error based on coding unit information, and determine a candidate coding unit and an associated rate distortion cost based on the residual error. An embodiment may additionally or alternatively include technology to partition a first coding unit into two or more smaller coding units based on a partition message, accelerate processing of at least one of the two or more smaller coding units, and estimate motion fora frame based at least partially on results of the accelerated processing. Other embodiments are disclosed and claimed.
Abstract:
Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.
Abstract:
Embodiments described herein provided for an instruction and associated logic to enable a processing resource including a tensor accelerator to perform optimized computation of sparse submatrix operations. One embodiment provides hardware logic to apply a numerical transform to matrix data to increase the sparsity of the data. Increasing the sparsity may result in a higher compression ratio when the matrix data is compressed.
Abstract:
An example apparatus for enhancing video includes a decoder to decode a received 360-degree projection format video bitstream to generate a decoded 360-degree projection format video. The apparatus also includes a viewport generator to generate a viewport from the decoded 360-degree projection format video. The apparatus further includes a convolutional neural network (CNN)-based filter to remove an artifact from the viewport to generate an enhanced image. The apparatus further includes a displayer to send the enhanced image to a display.
Abstract:
Embodiments of a video codec may include technology to derive a conformance point for a sub-region of coded pictures in a coded video sequence in the video data, group any combination of sub-pictures that form a rectangular region into a sub-picture set, and/or derive a level indicator corresponding to a sub-picture based on a level of the coded video sequence and a relative size of the coded picture and the sub-picture set. Other embodiments are disclosed and claimed.