Abstract:
A computing system includes a vision-based user interface platform to, among other things, analyze multi-modal user interactions, semantically correlate stored knowledge with visual features of a scene depicted in a video, determine relationships between different features of the scene, and selectively display virtual elements on the video depiction of the scene. The analysis of user interactions can be used to filter the information retrieval and correlating of the visual features with the stored knowledge.