Abstract:
Systems and methods for predicting T-Cell receptor (TCR)-peptide interaction, including training a deep learning model for the prediction of TCR-peptide interaction by determining a multiple sequence alignment (MSA) for TCR-peptide pair sequences from a dataset of TCR-peptide pair sequences using a sequence analyzer, building TCR structures and peptide structures using the MSA and corresponding structures from a Protein Data Bank (PDB) using a MODELLER, and generating an extended TCR-peptide training dataset based on docking energy scores determined by docking peptides to TCRs using physical modeling based on the TCR structures and peptide structures built using the MODELLER. TCR-peptide pairs are classified and labeled as positive or negative pairs using pseudo-labels based on the docking energy scores, and the deep learning model is iteratively retrained based on the extended TCR-peptide training dataset and the pseudo- labels until convergence.
Abstract:
A computer-implemented method executed by a processor for training a neural network to recognize driving scenes from sensor data received from vehicle radar is presented. The computer-implemented method includes extracting substructures from the sensor data received from the vehicle radar to define a graph having a plurality of nodes and a plurality of edges, constructing a neural network for each extracted substructure, combining the outputs of each of the constructed neural networks for each of the plurality of edges into a single vector describing a driving scene of a vehicle, and classifying the single vector into a set of one or more dangerous situations involving the vehicle.
Abstract:
Systems and methods for matching job descriptions with job applicants is provided. The method includes allocating each of one or more job applicants curriculum vitae (CV) into sections 320; applying max pooled word embedding 330 to each section of the job applicants CVs; using concatenated max-pooling and average-pooling 340 to compose the section embeddings into an applicants CV representation; allocating each of one or more job position descriptions into specified sections 220; applying max pooled word embedding 230 to each section of the job position descriptions; using concatenated max-pooling and average-pooling 240 to compose the section embeddings into a job representation; calculating a cosine similarity 250, 350 between each of the job representations and each of the CV representations to perform job-to-applicant matching; and presenting an ordered list of the one or more job applicants 360 or an ordered list of the one or more job position descriptions 260 to a user.
Abstract:
A computer-implemented method is provided for action localization. The method includes converting (510) one or more video frames into person keypoints and object keypoints. The method further includes embedding (520) position, timestamp, instance, and type information with the person keypoints and object keypoints to obtain keypoint embeddings. The method also includes predicting (530), by a hierarchical transformer encoder using the keypoint embeddings, human actions and bounding box information of when and where the human actions occur in the one or more video frames.
Abstract:
A method for compositional reasoning of group activity in videos with keypoint-only modality is presented. The method includes obtaining video frames from a video stream received from a plurality of video image capturing devices, extracting keypoints all of persons detected in the video frames to define keypoint data, tokenizing the keypoint data with time and segment information, clustering groups of keypoint persons in the video frames and passing the clustering groups through multi-scale prediction, and performing a prediction to provide a group activity prediction of a scene in the video frames.
Abstract:
A method for learning disentangled representations of videos is presented. The method includes feeding (1001) each frame of video data into an encoder to produce a sequence of visual features, passing (1003) the sequence of visual features through a deep convolutional network to obtain a posterior of a dynamic latent variable and a posterior of a static latent variable, sampling (1005) static and dynamic representations from the posterior of the static latent variable and the posterior of the dynamic latent variable, respectively, concatenating (1007) the static and dynamic representations to be fed into a decoder to generate reconstructed sequences, and applying (1009) three regularizes to the dynamic and static latent variables to trigger representation disentanglement. To facilitate the disentangled sequential representation learning, orthogonal factorization in generative adversarial network (GAN) latent space is leveraged to pre-train a generator as a decoder in the method.
Abstract:
A video device for predicting driving situations while a person drives a car is presented. The video device includes multi-modal sensors and knowledge data for extracting feature maps, a deep neural network trained with training data to recognize real-time traffic scenes (TSs) from a viewpoint of the car, and a user interface (UI) for displaying the real-time TSs. The real-time TSs are compared to predetermined TSs to predict the driving situations. The video device can be a video camera. The video camera can be mounted to a windshield of the car. Alternatively, the video camera can be incorporated into the dashboard or console area of the car. The video camera can calculate speed, velocity, type, and/or position information related to other cars within the real-time TS. The video camera can also include warning indicators, such as light emitting diodes (LEDs) that emit different colors for the different driving situations.
Abstract:
A computer-implemented method for training a deep neural network to recognize traffic scenes (TSs) from multi-modal sensors and knowledge data is presented. The computer-implemented method includes receiving data from the multi-modal sensors and the knowledge data and extracting feature maps from the multi-modal sensors and the knowledge data by using a traffic participant (TS) extractor to generate a first set of data, using a static objects extractor to generate a second set of data, and using an additional information extractor. The computer-implemented method further includes training the deep neural network, with training data, to recognize the TSs from a viewpoint of a vehicle.
Abstract:
Methods and systems for training a machine learning model include embedding (304) a state, including a peptide sequence and a protein, as a vector. An action, including a modification to an amino acid in the peptide sequence, is predicted (306) using a presentation score of the peptide sequence by the protein as a reward. A mutation policy model is trained (308), using the state and the reward, to generate modifications that increase the presentation score.
Abstract:
Methods and systems for training a model include encoding (203) training peptide sequences using an encoder model. A new peptide sequence is generated (202) using a generator model. The encoder model, the generator model, and the discriminator model are trained (206) to cause the generator model to generate new peptides that the discriminator mistakes for the training peptide sequences, including learning projection vectors with respective cross-entropy losses for binding sequences and non-binding sequences.