Abstract:
Methods, systems, and apparatus are presented for reducing distortion in an image, such as a video image. A video image can be captured by an image capture device, e.g. during a video conferencing session. Distortion correction processing, such as the application of one or more warping techniques, can be applied to the captured image to produce a distortion corrected image, which can be transmitted to one or more participants. The warping techniques can be performed in accordance with one or more warp parameters specifying a transformation of the captured image. Further, the warp parameters can be generated in accordance with an orientation of the image capture device, which can be determined based on sensor data or can be a fixed value. Additionally or alternatively, the warp parameters can be determined in accordance with a reference image or model to which the captured image should be warped.
Abstract:
An error recovery method may be engaged by an encoder to recover from misalignment between reference picture caches at the encoder and decoder. When a communication error is detected between a coder and a decoder, a number of non-acknowledged reference frames present in the decoder's reference picture cache may be estimated. Thereafter, frames may be coded as reference frames in a number greater or equal to the number of non-acknowledged reference frames that are estimated to be present in the decoder's reference picture cache. Thereafter, ordinary coding operations may resume. Typically, a final reference frame that is coded in the error recovery mode will be coded as a synchronization frame that has high coding quality. The coded reference frames that precede it may be coded at low quality (or may be coded as SKIP-coded frames). On reception and decoding, the preceding frames may cause the decoder to flush from its reference picture cache any non-acknowledged reference frames that otherwise might collide with the new synchronization frame. In this manner, alignment between the encoder and decoder may be restored.
Abstract:
A first improvement is described for prediction of motion vectors to be used in prediction of video data for enhancement layer data. Arbitrary pixelblock partitioning between base layer data and enhancement layer data raises problems to identify base layer motion vectors to be used as prediction sources for enhancement layer motion vectors. The disclosed method develops enhancement layer motion vectors by scaling a base layer pixelblock partition map according to a size difference between the base layer video image and the enhancement layer video image, then identified scale base layer pixelblocks that are co-located with the enhancement layer pixelblocks for which motion vector prediction is to be performed. Motion vectors from the scaled co-located base layer pixelblocks are averaged, weighted according to a degree of overlap between the base layer pixelblocks and the enhancement layer pixelblock. Another improvement is obtained by filtering recovered base layer image data before being provided to an enhancement layer decoder. When a specified filter requires image data outside a prediction region available from a base layer decoder, the prediction region data may be supplemented with previously-decoded data from an enhancement layer at a border of the prediction region.
Abstract:
A method includes receiving input data at a trained machine learning model that includes a common part and task-specific parts, receiving an execution instruction that identifies one or more processing tasks to be performed, processing the input data using the common part of the trained machine learning model to generate intermediate data, and processing the intermediate data using one or more of the task-specific parts of the trained machine learning model based on the execution instruction to generate one or more outputs.
Abstract:
Embodiments of the present invention provide apparatuses and methods of coding video. The apparatuses and methods may further provide coding a source video sequence according to a block-based coding process, estimating processing capabilities of a target decoder, determining if the estimated processing capabilities are sufficient to perform deblocking filtering. If not sufficient, the apparatuses and methods may provide computing deblocking filter strengths for pixel blocks of the source video sequence to be used at decoding, and transmitting the deblocking filter strengths in a coded video data signal with the coded video data. Moreover, if not sufficient, the apparatuses and methods may provide changing coding parameters including, but not limited to, block sizes, transform sizes, and Qmatrix.
Abstract:
The subject technology receives a neural network model in a model format, the model format including information for a set of layers of the neural network model, each layer of the set of layers including a set of respective operations. The subject technology generates neural network (NN) code from the neural network model, the NN code being in a programming language distinct from the model format, and the NN code comprising a respective memory allocation for each respective layer of the set of layers of the neural network model, where the generating comprises determining the respective memory allocation for each respective layer based at least in part on a resource constraint of a target device. The subject technology compiles the NN code into a binary format. The subject technology generates a package for deploying the compiled NN code on the target device.
Abstract:
Techniques for encoding data based at least in part upon an awareness of the decoding complexity of the encoded data and the ability of a target decoder to decode the encoded data are disclosed. In some embodiments, a set of data is encoded based at least in part upon a state of a target decoder to which the encoded set of data is to be provided. In some embodiments, a set of data is encoded based at least in part upon the states of multiple decoders to which the encoded set of data is to be provided.
Abstract:
In the field of Human-computer interaction (HCI), i.e., the study of the interfaces between people (i.e., users) and computers, understanding the intentions and desires of how the user wishes to interact with the computer is a very important problem. The ability to understand human gestures, and, in particular, hand gestures, as they relate to HCI, is a very important aspect in understanding the intentions and desires of the user in a wide variety of applications. In this disclosure, a novel system and method for three-dimensional hand tracking using depth sequences is described. Some of the major contributions of the hand tracking system described herein include: 1.) a robust hand detector that is invariant to scene background changes; 2.) a bi-directional tracking algorithm that prevents detected hands from always drifting closer to the front of the scene (i.e., forward along the z-axis of the scene); and 3.) various hand verification heuristics.
Abstract:
Systems and methods for applying a new quality metric for coding video are provided. The metric, based on the Just Noticeable Difference (JND) distortion visibility model, allows for efficient selection of coding techniques that limit perceptible distortion in the video while still taking into account parameters, such as desired bit rate, that can enhance system performance. Additionally, the unique aspects of each input type, system and display may be considered. Allowing for a programmable minimum viewing distance (MVD) parameter also ensures that the perceptible distortion will not be noticeable at the specified MVD, even though the perceptible distortion may be significant at an alternate distance.
Abstract:
Systems, apparatuses and methods whereby coded bitstreams are delivered to downstream end-user devices having various performance capabilities. A head-end encoder/video store generates a primary coded bitstream and metadata for delivery to an intermediate re-encoding system. The re-encoding system recodes the primary coded bitstream to generate secondary coded bitstreams based on coding parameters in the metadata. Each secondary coded bitstream is matched to a conformance point of a downstream end-user device. Coding parameters for each conformance point can be derived from the head-end encoder encoding original source video to generate the secondary coded bitstreams and extracting information from the coding process/results. The metadata can then can be communicated as part of the primary coded bitstream (e.g., as SEI) or can be communicated separately. As a result, the complexity of the secondary coded bitstream is appropriately scaled to match the capabilities of the downstream end-user device to which it is delivered.