Abstract:
A method, apparatus, and system for providing orientation and location estimates for a query ground image include determining spatial-aware features of a ground image and applying a model to the determined spatial-aware features to determine orientation and location estimates of the ground image. The model can be trained by collecting a set of ground images, determining spatial-aware features for the ground images, collecting a set of geo-referenced images, determining spatial-aware features for the geo-referenced images, determining a similarity of the spatial-aware features of the ground images and the geo-referenced images, pairing ground images and geo-referenced images based on the determined similarity, determining a loss function that jointly evaluates orientation and location information, creating a training set including the paired ground images and geo-referenced images and the loss function, and training the neural network to determine orientation and location estimates of ground images without the use of 3D data.
Abstract:
During GPS-denied/restricted navigation, images proximate a platform device are captured using a camera, and corresponding motion measurements of the platform device are captured using an IMU device. Features of a current frame of the images captured are extracted. Extracted features are matched and feature information between consecutive frames is tracked. The extracted features are compared to previously stored, geo-referenced visual features from a plurality of platform devices. If one of the extracted features does not match a geo-referenced visual feature, a pose is determined for the platform device using IMU measurements propagated from a previous pose and relative motion information between consecutive frames, which is determined using the tracked feature information. If at least one of the extracted features matches a geo-referenced visual feature, a pose is determined for the platform device using location information associated with the matched, geo-referenced visual feature and relative motion information between consecutive frames.
Abstract:
Methods and apparatuses for tracking objects comprise one or more optical sensors for capturing one or more images of a scene, wherein the one or more optical sensors capture a wide field of view and corresponding narrow field of view for the one or more mages of a scene, a localization module, coupled to the one or more optical sensors for determining the location of the apparatus, and determining the location of one more objects in the one or more images based on the location of the apparatus and an augmented reality module, coupled to the localization module, for enhancing a view of the scene on a display based on the determined location of the one or more objects.
Abstract:
A method, machine readable medium and system for RGBD semantic segmentation of video data includes determining semantic segmentation data and depth segmentation data for less than all classes for images of each frame of a first video, determining semantic segmentation data and depth segmentation data for images of each key frame of a second video including a synchronous combination of respective frames of the RGB video and the depth-aware video in parallel to the determination of the semantic segmentation data and the depth segmentation data for each frame of the first video, temporally and geometrically aligning respective frames of the first video and the second video, and predicting semantic segmentation data and depth segmentation data for images of a subsequent frame of the first video based on the determination of the semantic segmentation data and depth segmentation data for images of a key frame of the second video.
Abstract:
A method, machine readable medium and system for RGBD semantic segmentation of video data includes determining semantic segmentation data and depth segmentation data for less than all classes for images of each frame of a first video, determining semantic segmentation data and depth segmentation data for images of each key frame of a second video including a synchronous combination of respective frames of the RGB video and the depth-aware video in parallel to the determination of the semantic segmentation data and the depth segmentation data for each frame of the first video, temporally and geometrically aligning respective frames of the first video and the second video, and predicting semantic segmentation data and depth segmentation data for images of a subsequent frame of the first video based on the determination of the semantic segmentation data and depth segmentation data for images of a key frame of the second video.