Abstract:
Methods and apparatuses for tracking objects comprise one or more optical sensors for capturing one or more images of a scene, wherein the one or more optical sensors capture a wide field of view and corresponding narrow field of view for the one or more mages of a scene, a localization module, coupled to the one or more optical sensors for determining the location of the apparatus, and determining the location of one more objects in the one or more images based on the location of the apparatus and an augmented reality module, coupled to the localization module, for enhancing a view of the scene on a display based on the determined location of the one or more objects.
Abstract:
A method, apparatus, and system for providing orientation and location estimates for a query ground image include determining spatial-aware features of a ground image and applying a model to the determined spatial-aware features to determine orientation and location estimates of the ground image. The model can be trained by collecting a set of ground images, determining spatial-aware features for the ground images, collecting a set of geo-referenced images, determining spatial-aware features for the geo-referenced images, determining a similarity of the spatial-aware features of the ground images and the geo-referenced images, pairing ground images and geo-referenced images based on the determined similarity, determining a loss function that jointly evaluates orientation and location information, creating a training set including the paired ground images and geo-referenced images and the loss function, and training the neural network to determine orientation and location estimates of ground images without the use of 3D data.
Abstract:
A method for AI-driven augmented reality mentoring includes determining semantic features of objects in at least one captured scene, determining 3D positional information of the objects, combining information regarding the identified objects with respective 3D positional information to determine at least one intermediate representation, completing the determined intermediate representation using machine learning to include additional objects or positional information of the objects not identifiable from the at least one captured scene, determining at least one task to be performed and determining steps to be performed using a knowledge database, generating at least one visual representation relating to the determined steps for performing the at least one task, determining a correct position for displaying the at least one visual representation, and displaying the at least one visual representation on the see-through display in the determined correct position as an augmented overlay to the view of the at least one user.
Abstract:
A method, machine readable medium and system for RGBD semantic segmentation of video data includes determining semantic segmentation data and depth segmentation data for less than all classes for images of each frame of a first video, determining semantic segmentation data and depth segmentation data for images of each key frame of a second video including a synchronous combination of respective frames of the RGB video and the depth-aware video in parallel to the determination of the semantic segmentation data and the depth segmentation data for each frame of the first video, temporally and geometrically aligning respective frames of the first video and the second video, and predicting semantic segmentation data and depth segmentation data for images of a subsequent frame of the first video based on the determination of the semantic segmentation data and depth segmentation data for images of a key frame of the second video.
Abstract:
A method, machine readable medium and system for RGBD semantic segmentation of video data includes determining semantic segmentation data and depth segmentation data for less than all classes for images of each frame of a first video, determining semantic segmentation data and depth segmentation data for images of each key frame of a second video including a synchronous combination of respective frames of the RGB video and the depth-aware video in parallel to the determination of the semantic segmentation data and the depth segmentation data for each frame of the first video, temporally and geometrically aligning respective frames of the first video and the second video, and predicting semantic segmentation data and depth segmentation data for images of a subsequent frame of the first video based on the determination of the semantic segmentation data and depth segmentation data for images of a key frame of the second video.
Abstract:
A method, apparatus, and system for developing an understanding of at least one perceived environment includes determining semantic features and respective positional information of the semantic features from received data related to images and respective depth-related content of the at least one perceived environment on the fly as changes in the received data occur, for each perceived environment, combining information of the determined semantic features with the respective positional information to determine a compact representation of the perceived environment which provides information regarding positions of the semantic features in the perceived environment and at least spatial relationships among the sematic features, for each of the at least one perceived environments, combining information from the determined intermediate representation with information stored in a foundational model to determine a respective understanding of the perceived environment, and outputting an indication of the determined respective understanding.
Abstract:
Systems and methods for providing remote farm damage assessment are provided herein. In some embodiments, a system and method for providing remote farm damage assessment may include, determining a set of damage assessment locales for damage assessment; incorporating the set of damage assessment locales into a workflow; providing the workflow to a user device; receiving a first set of damage assessment images from the user device based on the workflow provided, wherein each of the first set of damage assessment images includes geolocation information and camera information; determining a damage assessment based on the first set of damage assessment images using a damage assessment machine learning model; and outputting a damage assessment indication including one or more of whether there is damage, a confidence level of assessing the damage, or a confidence level associated with the level of damage.
Abstract:
A method, apparatus and system for object detection in sensor data having at least two modalities using a common embedding space includes creating first modality vector representations of features of sensor data having a first modality and second modality vector representations of features of sensor data having a second modality, projecting the first and second modality vector representations into the common embedding space such that related embedded modality vectors are closer together in the common embedding space than unrelated modality vectors, combining the projected first and second modality vector representations, and determining a similarity between the combined modality vector representations and respective embedded vector representations of features of objects in the common embedding space to identify at least one object depicted by the captured sensor data. In some instances, data manipulation of the method, apparatus and system can be guided by physics properties of a sensor and/or sensor data.
Abstract:
A method, apparatus and system for efficient navigation in a navigation space includes determining semantic features and respective 3D positional information of the semantic features for scenes of captured image content and depth-related content in the navigation space, combining information of the determined semantic features of the scene with respective 3D positional information using neural networks to determine an intermediate representation of the scene which provides information regarding positions of the semantic features in the scene and spatial relationships among the sematic features, and using the information regarding the positions of the semantic features and the spatial relationships among the sematic features in a machine learning process to provide at least one of a navigation path in the navigation space, a model of the navigation space, and an explanation of a navigation action by the single, mobile agent in the navigation space.
Abstract:
During GPS-denied/restricted navigation, images proximate a platform device are captured using a camera, and corresponding motion measurements of the platform device are captured using an IMU device. Features of a current frame of the images captured are extracted. Extracted features are matched and feature information between consecutive frames is tracked. The extracted features are compared to previously stored, geo-referenced visual features from a plurality of platform devices. If one of the extracted features does not match a geo-referenced visual feature, a pose is determined for the platform device using IMU measurements propagated from a previous pose and relative motion information between consecutive frames, which is determined using the tracked feature information. If at least one of the extracted features matches a geo-referenced visual feature, a pose is determined for the platform device using location information associated with the matched, geo-referenced visual feature and relative motion information between consecutive frames.