Abstract:
An evaluation engine has two or more modules to assist a driver of a vehicle. A driver drowsiness module analyzes monitored features of the driver to recognize two or more levels of drowsiness of the driver of the vehicle. The driver drowsiness module evaluates drowsiness of the driver based on observed body language and facial analysis of the driver. The driver drowsiness module is configured to analyze live multi-modal sensor inputs from sensors against at least one of i) a trained artificial intelligence model and ii) a rules based model while the driver is driving the vehicle to produce an output comprising a driver drowsiness-level estimation. A driver assistance module provides one or more positive assistance mechanisms to the driver to return the driver to be at or above the designated level of drowsiness.
Abstract:
A method for assisting a user with one or more desired tasks is disclosed. For example, an executable, generic language understanding module and an executable, generic task reasoning module are provided for execution in the computer processing system. A set of run-time specifications is provided to the generic language understanding module and the generic task reasoning module, comprising one or more models specific to a domain. A language input is then received from a user, an intention of the user is determined with respect to one or more desired tasks, and the user is assisted with the one or more desired tasks, in accordance with the intention of the user.
Abstract:
A method for assisting a user with one or more desired tasks is disclosed. For example, an executable, generic language understanding module and an executable, generic task reasoning module are provided for execution in the computer processing system. A set of run-time specifications is provided to the generic language understanding module and the generic task reasoning module, comprising one or more models specific to a domain. A language input is then received from a user, an intention of the user is determined with respect to one or more desired tasks, and the user is assisted with the one or more desired tasks, in accordance with the intention of the user.
Abstract:
A computing system includes a vision-based user interface platform to, among other things, analyze multi-modal user interactions, semantically correlate stored knowledge with visual features of a scene depicted in a video, determine relationships between different features of the scene, and selectively display virtual elements on the video depiction of the scene. The analysis of user interactions can be used to filter the information retrieval and correlating of the visual features with the stored knowledge.
Abstract:
Disclosed techniques can generate content object summaries. Content of a content object can be parsed into a set of word groups. For each word group, at least one topic to which the word group pertains can be identified and it can be determined, via a user model, at least one weight of the plurality of weights corresponding to the topic(s). For each word group, a score can be determined for the word group based on the weight(s). A subset of the set of word groups can be selected based on the scores for the word group. A summary of the content object can be generated that includes the subset but that does not include one or more other word groups in the set of word groups that are not in the subset. At least part of the summary of the content object can be output.
Abstract:
Embodiments of the disclosed technologies include a method of capturing, using a mobile device, a best-focused image of a skin surface of a subject, the method including: setting a camera of the mobile device to a fixed focal length; capturing, using the camera, a current image of a plurality of images of the skin surface, the plurality of images having a sequence and including a first previous image captured, using the camera, previously to the current image and a second previous image captured, using the camera, previously to the first previous image; producing a modified image from the current image; transforming the modified image, using a Laplacian pyramid, to produce a plurality of first luminance values from the modified image and a plurality of second luminance values from the plurality of first luminance values; averaging a plurality of first squared values, each including a square of a corresponding first luminance value of the plurality of first luminance values, to produce a first energy value; averaging a plurality of second squared values, each including a square of a corresponding second luminance value of the plurality of second luminance values, to produce a second energy value; calculating a first ratio of the first energy value to the second energy value; calculating, as an average first energy value of the first previous image, an average of the first energy value, a corresponding first energy value of the first previous image, and a corresponding first energy value of the second previous image; calculating, as an average first ratio of the first previous image, an average of the first ratio, a corresponding first ratio of the first previous image, and a corresponding first ratio of the second previous image; determining that the first previous image is one of a plurality of valid images, where each valid image of the plurality of valid images is an image of the plurality of images and has: a corresponding average first energy value above an energy threshold value; and a corresponding average first ratio approximately equal to 1.0; determining that a first valid image of the plurality of valid images is the best-focused image, where the first valid image has a corresponding average first energy value that is greater than the corresponding average first energy values of: a previous valid image captured immediately before the first valid image; and a subsequent valid image captured immediately after the first valid image; and performing an action associated with the best-focused image.
Abstract:
Device logic in a mobile device configures a processor to capture a series of images, such as a video, using a consumer-grade camera, and to analyze the images to determine the best-focused image, of the series of images, that captures a region of interest. The images may be of a textured surface, such as facial skin of a mobile device user. The processor sets a focal length of the camera to a fixed position for collecting the images. The processor may guide the user to position the mobile device for capturing the images, using audible cues. For each image, the processor crops the image to the region of interest, extracts luminance information, and determines one or more energy levels of the luminance via a Laplacian pyramid. The energy levels may be filtered, and then are compared to energy levels of the other images to determine the best-focused image.
Abstract:
Provided are systems, computer-implemented methods, and computer-program products for a multi-lingual device, capable of receiving verbal input in multiple languages, and further capable of providing conversational responses in multiple languages. In various implementations, the multi-lingual device includes an automatic speech recognition engine capable of receiving verbal input in a first natural language and providing a textual representation of the input and a confidence value for the recognition. The multi-lingual device can also include a machine translation engine, capable of translating textual input from the first natural language into a second natural language. The machine translation engine can output a confidence value for the translation. The multi-lingual device can further include natural language processing, capable of translating from the second natural language to a computer-based language. Input in the computer-based language can be processed, and the multi-lingual device can take an action based on the result of the processing.
Abstract:
An electronic device for providing health information or assistance includes an input configured to receive at least one type of signal selected from sound signals, verbal signals, non-verbal signals, and combinations thereof, a communication module configured to send information related to the at least one user and his/her environment to a remote device, including the sound signals, non-verbal signals, and verbal signals, the remote device being configured to analyze a condition of the at least one user and communicate condition signals to the electronic device, a processing module configured to receive the condition signals and to cause the electronic device to engage in a passive monitoring mode or an active engagement and monitoring mode, the active engagement and monitoring mode including, but not limited to, verbal communication with the at least one user, and an output configured to engage the at least one user in verbal communication.
Abstract:
This disclosure describes machine learning techniques for capturing human knowledge for performing a task. In one example, a video device obtains video data of a first user performing the task and one or more sensors generate sensor data during performance of the task. An audio device obtains audio data describing performance of the task. A computation engine applies a machine learning system to correlate the video data to the audio data and sensor data to identify portions of the video, sensor, and audio data that depict a same step of a plurality of steps for performing the task. The machine learning system further processes the correlated data to update a domain model defining performance of the task. A training unit applies the domain model to generate training information for performing the task. An output device outputs the training information for use in training a second user to perform the task.