Abstract:
Methods and systems for tracking movement include performing person detection in frames from multiple video streams to identify detection images. Visual and location information from the detection images are combined to generate scores for pairs of detection images across the multiple video streams and across frames of respective video streams. A pairwise detection graph is generated using the detection images as nodes and the scores as weighted edges. A current view of the multiple video streams is changed to a next view of the multiple video streams, responsive to a determination that a score between consecutive frames of the view is below a threshold value and that a score between coincident frames of the current view and the next view is above the threshold value.
Abstract:
A video device for predicting driving situations while a person drives a car is presented. The video device includes multi-modal sensors and knowledge data for extracting feature maps, a deep neural network trained with training data to recognize real-time traffic scenes (TSs) from a viewpoint of the car, and a user interface (UI) for displaying the real-time TSs. The real-time TSs are compared to predetermined TSs to predict the driving situations. The video device can be a video camera. The video camera can be mounted to a windshield of the car. Alternatively, the video camera can be incorporated into the dashboard or console area of the car. The video camera can calculate speed, velocity, type, and/or position information related to other cars within the real-time TS. The video camera can also include warning indicators, such as light emitting diodes (LEDs) that emit different colors for the different driving situations.
Abstract:
Semantic indexing methods and systems are disclosed. One such method is directed to training a semantic indexing model by employing an expanded query. The query can be expanded by merging the query with documents that are relevant to the query for purposes of compensating for a lack of training data. In accordance with another exemplary aspect, time difference features can be incorporated into a semantic indexing model to account for changes in query distributions over time.
Abstract:
Systems and methods are disclosed for classifying histological tissues or specimens with two phases. In a first phase, the method includes providing off-line training using a processor during which one or more classifiers are trained based on examples, including: finding a split of features into sets of increasing computational cost, assigning a computational cost to each set; training for each set of features a classifier using training examples; training for each classifier, a utility function that scores a usefulness of extracting the next feature set for a given tissue unit using the training examples. In a second phase, the method includes applying the classifiers to an unknown tissue sample with extracting the first set of features for all tissue units; deciding for which tissue unit to extract the next set of features by finding the tissue unit for which a score: S=U−h*C is maximized, where U is a utility function, C is a cost of acquiring the feature and h is a weighting parameter; iterating until a stopping criterion is met or no more feature can be computed; and issuing a tissue-level decision based on a current state.
Abstract:
Systems and methods for a multi-entity tracking transformer model (MCTR). To train the MCTR, processing track embeddings and detection embeddings of video feeds obtained from multiple cameras to generate updated track embeddings with a tracking module. The updated track embeddings can be associated with the detection embeddings to generate track-detection associations (TDA) for each camera view and camera frame with an association module. A cost module can calculate a differentiable loss from the TDA by combining a detection loss, a track loss and an auxiliary track loss. A model trainer can train the MCTR using the differentiable loss and contiguous video segments sampled from a training dataset to track multiple objects with multiple cameras.
Abstract:
Methods and systems for answering a query include generating first tokens in response to an input query using a language model, the first tokens including a retrieval rule. A retrieval rule is used to search for information to generate dynamic tokens. The retrieval rule in the first tokens is replaced with the dynamic tokens to generate a dynamic partial response. Second tokens are generated in response to the input query. The second tokens are appended to the dynamic partial response to generate an output responsive to the input query.
Abstract:
A computer-implemented method executed by a processor for training a neural network to recognize driving scenes from sensor data received from vehicle radar is presented. The computer-implemented method includes extracting substructures from the sensor data received from the vehicle radar to define a graph having a plurality of nodes and a plurality of edges, constructing a neural network for each extracted substructure, combining the outputs of each of the constructed neural networks for each of the plurality of edges into a single vector describing a driving scene of a vehicle, and classifying the single vector into a set of one or more dangerous situations involving the vehicle.
Abstract:
A computer-implemented method executed by a processor for training a neural network to recognize driving scenes from sensor data received from vehicle radar is presented. The computer-implemented method includes extracting substructures from the sensor data received from the vehicle radar to define a graph having a plurality of nodes and a plurality of edges, constructing a neural network for each extracted substructure, combining the outputs of each of the constructed neural networks for each of the plurality of edges into a single vector describing a driving scene of a vehicle, and classifying the single vector into a set of one or more dangerous situations involving the vehicle.
Abstract:
A computer-implemented method and system are provided. The system includes an image capture device configured to capture image data relative to an ambient environment of a user. The system further includes a processor configured to detect and localize objects, in a real-world map space, from the image data using a trainable object localization Convolutional Neural Network (CNN). The CNN is trained to detect and localize the objects from image and radar pairs that include the image data and radar data for different scenes of a natural environment. The processor is further configured to perform a user-perceptible action responsive to a detection and a localization of an object in an intended path of the user.
Abstract:
Methods and systems for configuring a machine learning model include selecting a head from a set of stored heads, responsive to an input, to implement a layer in a transformer machine learning model. The selected head is copied from persistent storage to active memory. The layer in the transformer machine learning model is executed on the input using the selected head to generate an output. An action is performed responsive to the output.