Abstract:
Techniques and systems are provided for prioritizing objects for object recognition in one or more video frames. For example, a current video frame is obtained, and a objects are detected in the current video frame. State information associated with the objects is determined. Priorities for the objects can also be determined. For example, a priority can be determined for an object based on state information associated with the object. Object recognition is performed for at least one object from the objects based on priorities determined for the at least one object. For instance, object recognition can be performed for objects having higher priorities before objects having lower priorities.
Abstract:
Techniques and systems are provided for processing video data. For example, techniques and systems are provided for performing context-aware object or blob tracker updates (e.g., by updating a motion model of a blob tracker). In some cases, to perform a context-aware blob tracker update, a blob tracker is associated with a first blob. The first blob includes pixels of at least a portion of one or more foreground objects in one or more video frames. A split of the first blob and a second blob in a current video frame can be detected, and a motion model of the blob tracker is reset in response to detecting the split of the first blob and the second blob. In some cases, a motion model of a blob tracker associated with a merged blob is updated to include a predicted location of the blob tracker in a next video frame. The motion model can be updated by using a previously predicted location of blob tracker as the predicted location of the blob tracker in the next video frame in response to the blob tracker being associated with the merged blob. The previously predicted location of the blob tracker can be determined using a blob location of a blob from a previous video frame.
Abstract:
Techniques and systems are provided for maintaining lost blob trackers for one or more video frames. In some examples, one or more blob trackers maintained for a sequence of video frames are identified. The one or more blob trackers are associated with one or more blobs of the sequence of video frames. A transition of a blob tracker from a first type of tracker to a lost tracker is detected at a first video frame. For example, the blob tracker can be transitioned from the first type of tracker to the lost tracker when a blob for which the blob tracker was associated with in a previous frame is not detected in the first video frame. A recovery duration is determined for the lost tracker at the first video frame. For one or more subsequent video frames obtained after the first video frame, the lost tracker is removed from the one or more blob trackers maintained for the sequence of video frames when a lost duration for the lost tracker is greater than the recovery duration. The blob tracker can be transitioned back to the first type of tracker if the lost tracker is associated with a blob in a subsequent video frame prior to expiration of the recovery duration. Trackers and associated blobs are output as identified blob tracker-blob pairs when the trackers are converted from new trackers to trackers of the first type.
Abstract:
Techniques and systems are provided for processing video data. For example, techniques and systems are provided for performing content-adaptive object or blob tracking. To perform the content-adaptive object tracking, a blob tracker is associated with a blob generated for a video frame. The blob includes pixels of at least a portion of a foreground object in a video frame. A size of the blob can be determined to be greater than a blob size threshold. The blob tracker can be converted to a normal tracker based on the size of the blob being greater than the size threshold. The associated blob tracker and blob are output as an identified blob tracker-blob pair when the blob tracker is converted to the normal tracker.
Abstract:
Techniques and systems are provided for encoding video data. For example, a method of encoding video data includes obtaining a background picture that is generated based on a plurality of pictures captured by an image sensor. The background picture is generated to include background portions identified in each of the captured pictures. The method further includes encoding, into a video bitstream, a group of pictures captured by the image sensor. The group of pictures includes at least one random access picture. Encoding the group of pictures includes encoding at least a portion of the at least one random access picture using inter-prediction based on the background picture.
Abstract:
In an example, a method of decoding video data includes selecting a motion information derivation mode from a plurality of motion information derivation modes for determining motion information for a current block, where each motion information derivation mode of the plurality comprises performing a motion search for a first set of reference data that corresponds to a second set of reference data outside of the current block, and where the motion information indicates motion of the current block relative to reference video data. The method also includes determining the motion information for the current block using the selected motion information derivation mode. The method also includes decoding the current block using the determined motion information and without decoding syntax elements representative of the motion information.
Abstract:
In an example, a method of decoding video data may include receiving a first block of video data. The first block of video data may be a sub-block of a prediction unit. The method may include receiving one or more blocks of video data that neighbor the first block of video data. The method may include determining motion information of at least one of the one or more blocks of video data that neighbor the first block of video data. The method may include decoding, using overlapped block motion compensation, the first block of video data based at least in part on the motion information of the at least one of the one or more blocks that neighbor the first block of video data.
Abstract:
A video processing device may obtain, from a descriptor for a program comprising one or more elementary streams, a plurality of profile, tier, level (PTL) syntax element sets. The video processing device may obtain, from the descriptor, a plurality of operation point syntax element sets. For each respective operation point syntax element set of the plurality of operation point syntax element sets, the video processing device may determine, for each respective layer of the respective operation point specified by the respective operation point syntax element set, based on a respective syntax element in the respective operation point syntax element set, which of the PTL syntax element sets specifies the PTL information assigned to the respective layer, the respective operation point having a plurality of layers.
Abstract:
This disclosure describes techniques for simplifying depth inter mode coding in a three-dimensional (3D) video coding process, such as 3D-HEVC. The techniques include generating a motion parameter candidate list, e.g., merging candidate list, for a current depth prediction unit (PU). In some examples, the described techniques include determining that a sub-PU motion parameter inheritance (MPI) motion parameter candidate is unavailable for inclusion in the motion parameter candidate list for the current depth PU if motion parameters of a co-located texture block to a representative block of the current depth PU are unavailable. In some examples, the described techniques include deriving a sub-PU MPI candidate for inclusion in the motion parameter candidate list for the current depth PU only if a partition mode of the current depth PU is 2Nx2N.
Abstract:
A device for processing three-dimensional (3D) video data may determine, based on direct dependent layers signaled in a video parameter set, that the current texture layer of the video data is dependent on a depth layer of the video data; and process the current texture layer using the depth layer.