Abstract:
A system for object detection and tracking includes technologies to, among other things, detect and track moving objects, such as pedestrians and/or vehicles, in a real-world environment, handle static and dynamic occlusions, and continue tracking moving objects across the fields of view of multiple different cameras.
Abstract:
A system for object detection and tracking includes technologies to, among other things, detect and track moving objects, such as pedestrians and/or vehicles, in a real-world environment, handle static and dynamic occlusions, and continue tracking moving objects across the fields of view of multiple different cameras.
Abstract:
Methods, computing devices, and computer-program products are provided for implementing a virtual personal assistant. In various implementations, a virtual personal assistant can be configured to receive sensory input, including at least two different types of information. The virtual personal assistant can further be configured to determine semantic information from the sensory input, and to identify a context-specific framework. The virtual personal assistant can further be configured to determine a current intent. Determining the current intent can include using the semantic information and the context-specific framework. The virtual personal assistant can further be configured to determine a current input state. Determining the current input state can include using the semantic information and one or more behavioral models. The behavioral models can include one or more interpretations of previously-provided semantic information. The virtual personal assistant can further be configured to determine an action using the current intent and the current input state.
Abstract:
This disclosure describes techniques for improving accuracy of machine learning systems in facial recognition. The techniques include generating, from a training image comprising a plurality of pixels and labeled with a plurality of facial landmarks, one or more facial contour heatmaps, wherein each of the one or more facial contour heatmaps depicts an estimate of a location of one or more facial contours within the training image. Techniques further include training a machine learning model to process the one or more facial contour heatmaps to predict the location of the one or more facial contours within the training image, wherein training the machine learning model comprises applying a loss function to minimize a distance between the predicted location of the one or more facial contours within the training image and corresponding contour data generated from facial landmarks of the plurality of facial landmarks with which the training image is labeled.
Abstract:
In some examples, a computer-implemented collaboration assessment model identifies actions of each of two or more individuals depicted in video data, identify, based at least on the identified actions of each of the two or more individuals depicted in the video data, first behaviors at a first collaboration assessment level, identify, based at least on the identified actions of each of the two or more individuals depicted in the video data, second behaviors at a second collaboration assessment level different from the first collaboration assessment level, and generate and output, based at least on the first behaviors at the first collaboration assessment level and the second behaviors at the second collaboration assessment level, an indication of at least one of an assessment of a collaboration effort of the two or more individuals or respective assessments of individual contributions of the two or more individuals to the collaboration effort.
Abstract:
Methods, computing devices, and computer-program products are provided for implementing a virtual personal assistant. In various implementations, a virtual personal assistant can be configured to receive sensory input, including at least two different types of information. The virtual personal assistant can further be configured to determine semantic information from the sensory input, and to identify a context-specific framework. The virtual personal assistant can further be configured to determine a current intent. Determining the current intent can include using the semantic information and the context-specific framework. The virtual personal assistant can further be configured to determine a current input state. Determining the current input state can include using the semantic information and one or more behavioral models. The behavioral models can include one or more interpretations of previously-provided semantic information. The virtual personal assistant can further be configured to determine an action using the current intent and the current input state.
Abstract:
Methods, computing devices, and computer-program products are provided for implementing a virtual personal assistant. In various implementations, a virtual personal assistant can be configured to receive sensory input, including at least two different types of information. The virtual personal assistant can further be configured to determine semantic information from the sensory input, and to identify a context-specific framework. The virtual personal assistant can further be configured to determine a current intent. Determining the current intent can include using the semantic information and the context-specific framework. The virtual personal assistant can further be configured to determine a current input state. Determining the current input state can include using the semantic information and one or more behavioral models. The behavioral models can include one or more interpretations of previously-provided semantic information. The virtual personal assistant can further be configured to determine an action using the current intent and the current input state.
Abstract:
This disclosure describes techniques that include generating, based on a description of a scene, a movie or animation that represents at least one possible version of a story corresponding to the description of the scene. This disclosure also describes techniques for training a machine learning model to generate predefined data structures from textual information, visual information, and/or other information about a story, an event, a scene, or a sequence of events or scenes within a story. This disclosure also describes techniques for using GANs to generate, from input, an animation of motion (e.g., an animation or a video clip). This disclosure also describes techniques for implementing an explainable artificial intelligence system that may provide end users with information (e.g., through a user interface) that enables an understanding of at least some of the decisions made by the AI system.
Abstract:
This disclosure describes techniques that include generating, based on a description of a scene, a movie or animation that represents at least one possible version of a story corresponding to the description of the scene. This disclosure also describes techniques for training a machine learning model to generate predefined data structures from textual information, visual information, and/or other information about a story, an event, a scene, or a sequence of events or scenes within a story. This disclosure also describes techniques for using GANs to generate, from input, an animation of motion (e.g., an animation or a video clip). This disclosure also describes techniques for implementing an explainable artificial intelligence system that may provide end users with information (e.g., through a user interface) that enables an understanding of at least some of the decisions made by the AI system.
Abstract:
A complex video event classification, search and retrieval system can generate a semantic representation of a video or of segments within the video, based on one or more complex events that are depicted in the video, without the need for manual tagging. The system can use the semantic representations to, among other things, provide enhanced video search and retrieval capabilities.