Abstract:
A computing system includes a vision-based user interface platform to, among other things, analyze multi-modal user interactions, semantically correlate stored knowledge with visual features of a scene depicted in a video, determine relationships between different features of the scene, and selectively display virtual elements on the video depiction of the scene. The analysis of user interactions can be used to filter the information retrieval and correlating of the visual features with the stored knowledge.
Abstract:
A conversational assistant for conversational engagement platform can contain various modules including a user-model augmentation module, a dialogue management module, and a user-state analysis input/output module. The dialogue management module receives metrics tied to a user from the other modules to understand a current topic and a user's emotions regarding the current topic from the user-state analysis input/output module and then adapts dialogue from the dialogue management module to the user based on dialogue rules factoring in these different metrics. The dialogue rules also factors in both i) a duration of a conversational engagement with the user and ii) an attempt to maintain a positive experience for the user with the conversational engagement. A flexible ontology relationship representation about the user is built and stores learned metrics about the user over time with each conversational engagement, and then in combination with the dialogue rules, drives the conversations with the user.
Abstract:
Embodiments of the disclosed technologies include a method of capturing, using a mobile device, a best-focused image of a skin surface of a subject, the method including: setting a camera of the mobile device to a fixed focal length; capturing, using the camera, a current image of a plurality of images of the skin surface, the plurality of images having a sequence and including a first previous image captured, using the camera, previously to the current image and a second previous image captured, using the camera, previously to the first previous image; producing a modified image from the current image; transforming the modified image, using a Laplacian pyramid, to produce a plurality of first luminance values from the modified image and a plurality of second luminance values from the plurality of first luminance values; averaging a plurality of first squared values, each including a square of a corresponding first luminance value of the plurality of first luminance values, to produce a first energy value; averaging a plurality of second squared values, each including a square of a corresponding second luminance value of the plurality of second luminance values, to produce a second energy value; calculating a first ratio of the first energy value to the second energy value; calculating, as an average first energy value of the first previous image, an average of the first energy value, a corresponding first energy value of the first previous image, and a corresponding first energy value of the second previous image; calculating, as an average first ratio of the first previous image, an average of the first ratio, a corresponding first ratio of the first previous image, and a corresponding first ratio of the second previous image; determining that the first previous image is one of a plurality of valid images, where each valid image of the plurality of valid images is an image of the plurality of images and has: a corresponding average first energy value above an energy threshold value; and a corresponding average first ratio approximately equal to 1.0; determining that a first valid image of the plurality of valid images is the best-focused image, where the first valid image has a corresponding average first energy value that is greater than the corresponding average first energy values of: a previous valid image captured immediately before the first valid image; and a subsequent valid image captured immediately after the first valid image; and performing an action associated with the best-focused image.
Abstract:
Device logic in a mobile device configures a processor to capture a series of images, such as a video, using a consumer-grade camera, and to analyze the images to determine the best-focused image, of the series of images, that captures a region of interest. The images may be of a textured surface, such as facial skin of a mobile device user. The processor sets a focal length of the camera to a fixed position for collecting the images. The processor may guide the user to position the mobile device for capturing the images, using audible cues. For each image, the processor crops the image to the region of interest, extracts luminance information, and determines one or more energy levels of the luminance via a Laplacian pyramid. The energy levels may be filtered, and then are compared to energy levels of the other images to determine the best-focused image.