-
11.
公开(公告)号:US20250131027A1
公开(公告)日:2025-04-24
申请号:US18924763
申请日:2024-10-23
Applicant: SRI International
Inventor: Yangyi Chen , Karan Sikka , Michael A. Cogswell , Ajay Divakaran
IPC: G06F16/338 , G06F16/33 , G06F16/532
Abstract: In an example, a method for fine-tuning a Large Visual Language Model (LVLM) includes providing visual queries, each of the visual queries comprises at least an image and a textual query related to the image; processing, by the LVLM, the visual queries to extract visual embeddings from the visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for visual queries: i) generating, by the LVLM, a response to the corresponding visual query based on the corresponding visual embedding; ii) evaluating, by a second LLM, the generated response to verify that the generated response satisfies predefined criteria; and iii) providing, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tuning the LVLM using aggregated feedback provided by the second LLM for the visual queries.
-
公开(公告)号:US20250110989A1
公开(公告)日:2025-04-03
申请号:US18895080
申请日:2024-09-24
Applicant: SRI International
Inventor: Ajay Divakaran , Yi Yao , Julia Kruk , Jesse Hostetler , Jihua Huang
IPC: G06F16/901
Abstract: In general, various aspects of the techniques are directed to causal analysis using large scale time series data. A computing system may convert large scale time series data to first time period records and second time period records according to a multi-scale time resolution. The computing system may implement a hierarchical machine learning model to generate embeddings that capture temporal characteristics of features of the large scale time series data. The computing system may generate a graph data structure indicating cause and effect correlations between features of the large scale time series data based on temporal dynamics captured in the cause and second time period records and/or the embeddings.
-
公开(公告)号:US20240212350A1
公开(公告)日:2024-06-27
申请号:US18331007
申请日:2023-06-07
Applicant: SRI International
Inventor: Subhodev Das , Ajay Divakaran , Ali Chaudhry , Julia Kruk , Bo Dong
Abstract: In general, the disclosure describes techniques for joint spatiotemporal Artificial Intelligence (AI) models that can encompass multiple space and time resolutions through self-supervised learning. In an example, a method includes for each of a plurality of multimodal data, generating, by a computing system, using a first machine learning model, a respective modality feature vector representative of content of the multimodal data, wherein each of the generated modality feature vectors has a different modality; processing, by the computing system, each of generated modality feature vectors with a second machine learning model comprising an encoder model to generate event data comprising a plurality of events and/or activities of interest; and analyzing, by the computing system, the event data to generate anomaly data indicative of detected anomalies in the multimodal data.
-
公开(公告)号:US20190325243A1
公开(公告)日:2019-10-24
申请号:US16383447
申请日:2019-04-12
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Ankan Bansal
Abstract: A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.
-
公开(公告)号:US09734730B2
公开(公告)日:2017-08-15
申请号:US13755775
申请日:2013-01-31
Applicant: SRI International
Inventor: Ajay Divakaran , Behjat Siddiquie , Saad Khan , Jeffrey Lubin , Harpreet S. Sawhney
IPC: G09B19/00
CPC classification number: G09B19/00
Abstract: A multi-modal interaction modeling system can model a number of different aspects of a human interaction across one or more temporal interaction sequences. Some versions of the system can generate assessments of the nature or quality of the interaction or portions thereof, which can be used to, among other things, provide assistance to one or more of the participants in the interaction.
-
16.
公开(公告)号:US09244924B2
公开(公告)日:2016-01-26
申请号:US13737607
申请日:2013-01-09
Applicant: SRI INTERNATIONAL
Inventor: Hui Cheng , Harpreet Singh Sawhney , Ajay Divakaran , Qian Yu , Jingen Liu , Amir Tamrakar , Saad Ali , Omar Javed
IPC: G06F17/30
CPC classification number: G06F17/30823 , G06F17/30023 , G06F17/30784 , G06F17/30817
Abstract: A complex video event classification, search and retrieval system can generate a semantic representation of a video or of segments within the video, based on one or more complex events that are depicted in the video, without the need for manual tagging. The system can use the semantic representations to, among other things, provide enhanced video search and retrieval capabilities.
Abstract translation: 复杂的视频事件分类,搜索和检索系统可以基于视频中描绘的一个或多个复杂事件,而不需要手动标记来生成视频中的视频或片段的语义表示。 该系统可以使用语义表示来提供增强的视频搜索和检索功能。
-
17.
公开(公告)号:US20140347475A1
公开(公告)日:2014-11-27
申请号:US14286305
申请日:2014-05-23
Applicant: SRI International
Inventor: Ajay Divakaran , Qian Yu , Amir Tamrakar , Harpreet Singh Sawhney , Jiejie Zhu , Omar Javed , Jingen Liu , Hui Cheng , Jayakrishnan Eledath
IPC: G06K9/00
CPC classification number: G06K9/00771
Abstract: A system for object detection and tracking includes technologies to, among other things, detect and track moving objects, such as pedestrians and/or vehicles, in a real-world environment, handle static and dynamic occlusions, and continue tracking moving objects across the fields of view of multiple different cameras.
Abstract translation: 用于物体检测和跟踪的系统包括在现实环境中检测和跟踪诸如行人和/或车辆之类的移动物体的技术,处理静态和动态遮挡,以及继续跟踪所有场中的移动物体 的多个不同的相机的视图。
-
公开(公告)号:US08860813B2
公开(公告)日:2014-10-14
申请号:US13711220
申请日:2012-12-11
Applicant: SRI International
Inventor: Sang-Hack Jung , Ajay Divakaran , Harpreet Singh Sawhney
IPC: H04N7/18
CPC classification number: G06K9/00771 , G06K9/6206 , G06K9/6211
Abstract: A computer-implemented method for matching objects is disclosed. At least two images where one of the at least two images has a first target object and a second of the at least two images has a second target object are received. At least one first patch from the first target object and at least one second patch from the second target object are extracted. A distance-based part encoding between each of the at least one first patch and the at least one second patch based upon a corresponding codebook of image parts including at least one of part type and pose is constructed. A viewpoint of one of the at least one first patch is warped to a viewpoint of the at least one second patch. A parts level similarity measure based on the view-invarient distance measure for each of the at least one first patch and the at least one second patch is applied to determine whether the first target object and the second target object are the same or different objects.
Abstract translation: 公开了一种用于匹配对象的计算机实现的方法。 接收至少两个图像,其中至少两个图像中的一个具有第一目标对象,并且至少两个图像中的第二图像具有第二目标对象。 提取来自第一目标对象的至少一个第一补丁和来自第二目标对象的至少一个第二补丁。 构建基于包括部件类型和姿态中的至少一个的图像部件的对应码本的至少一个第一贴片和至少一个第二贴片中的每一个之间的基于距离的部件编码。 所述至少一个第一贴片中的一个的视点弯曲到所述至少一个第二贴片的观点。 应用基于对于至少一个第一贴片和至少一个第二贴片中的每一个的视野不变距离度量的零件级相似性度量来确定第一目标对象和第二目标对象是相同还是不同的对象。
-
公开(公告)号:US20140212853A1
公开(公告)日:2014-07-31
申请号:US13755775
申请日:2013-01-31
Applicant: SRI International
Inventor: Ajay Divakaran , Behjat Siddiquie , Saad Khan , Jeffrey Lubin , Harpreet S. Sawhney
IPC: G09B19/00
CPC classification number: G09B19/00
Abstract: A multi-modal interaction modeling system can model a number of different aspects of a human interaction across one or more temporal interaction sequences. Some versions of the system can generate assessments of the nature or quality of the interaction or portions thereof, which can be used to, among other things, provide assistance to one or more of the participants in the interaction.
-
公开(公告)号:US20250131212A1
公开(公告)日:2025-04-24
申请号:US18919630
申请日:2024-10-18
Applicant: SRI International
Inventor: Pengfei Yu , Yi Yao , Karan Sikka , Michael A. Cogswell , Ajay Divakaran
IPC: G06F40/56
Abstract: In an example, a method for generating responses by a Machine Learning (ML) system includes processing, by a first language model, a natural language instruction to generate an instruction representation based on a meaning of the natural language instruction; translating, by a translation module comprising an interface between the first language model and a second language model, the instruction representation into data indicating an intent of the natural language instruction, wherein the second language model is trained with domain specific knowledge; providing, by the translation module, the natural language instruction and the data indicating the intent of the natural language instruction to the second language model; and generating, by the second language model, a response based on the natural language instruction and the data indicating the intent of the natural language instruction.
-
-
-
-
-
-
-
-
-