-
公开(公告)号:US20250131212A1
公开(公告)日:2025-04-24
申请号:US18919630
申请日:2024-10-18
Applicant: SRI International
Inventor: Pengfei Yu , Yi Yao , Karan Sikka , Michael A. Cogswell , Ajay Divakaran
IPC: G06F40/56
Abstract: In an example, a method for generating responses by a Machine Learning (ML) system includes processing, by a first language model, a natural language instruction to generate an instruction representation based on a meaning of the natural language instruction; translating, by a translation module comprising an interface between the first language model and a second language model, the instruction representation into data indicating an intent of the natural language instruction, wherein the second language model is trained with domain specific knowledge; providing, by the translation module, the natural language instruction and the data indicating the intent of the natural language instruction to the second language model; and generating, by the second language model, a response based on the natural language instruction and the data indicating the intent of the natural language instruction.
-
2.
公开(公告)号:US20250131027A1
公开(公告)日:2025-04-24
申请号:US18924763
申请日:2024-10-23
Applicant: SRI International
Inventor: Yangyi Chen , Karan Sikka , Michael A. Cogswell , Ajay Divakaran
IPC: G06F16/338 , G06F16/33 , G06F16/532
Abstract: In an example, a method for fine-tuning a Large Visual Language Model (LVLM) includes providing visual queries, each of the visual queries comprises at least an image and a textual query related to the image; processing, by the LVLM, the visual queries to extract visual embeddings from the visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for visual queries: i) generating, by the LVLM, a response to the corresponding visual query based on the corresponding visual embedding; ii) evaluating, by a second LLM, the generated response to verify that the generated response satisfies predefined criteria; and iii) providing, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tuning the LVLM using aggregated feedback provided by the second LLM for the visual queries.
-
公开(公告)号:US20190325243A1
公开(公告)日:2019-10-24
申请号:US16383447
申请日:2019-04-12
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Ankan Bansal
Abstract: A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.
-
公开(公告)号:US11610384B2
公开(公告)日:2023-03-21
申请号:US17337093
申请日:2021-06-02
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Ankan Bansal
IPC: G06V10/22 , G06K9/62 , G06N20/00 , G06N5/04 , G06T11/20 , G06V10/40 , G06V10/20 , G06V20/10 , G06V30/262 , G06V10/75
Abstract: A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.
-
公开(公告)号:US11055555B2
公开(公告)日:2021-07-06
申请号:US16383447
申请日:2019-04-12
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Ankan Bansal
Abstract: A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.
-
公开(公告)号:US10824916B2
公开(公告)日:2020-11-03
申请号:US16126748
申请日:2018-09-10
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Parneet Kaur
Abstract: Systems and methods for improving the accuracy of a computer system for object identification/classification through the use of weakly supervised learning are provided herein. In some embodiments, the method includes (a) receiving at least one set of curated data, wherein the curated data includes labeled images, (b) using the curated data to train a deep network model for identifying objects within images, wherein the trained deep network model has a first accuracy level for identifying objects, receiving a first target accuracy level for object identification of the deep network model, determining, automatically via the computer system, an amount of weakly labeled data needed to train the deep network model to achieve the first target accuracy level, and augmenting the deep network model using weakly supervised learning and the weakly labeled data to achieve the first target accuracy level for object identification by the deep network model.
-
公开(公告)号:US20190325342A1
公开(公告)日:2019-10-24
申请号:US16383429
申请日:2019-04-12
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Julia Kruk
Abstract: Embedding multimodal content in a common geometric space includes for each of a plurality of content of the multimodal content, creating a respective, first modality feature vector representative of content of the multimodal content having a first modality using a first machine learning model; for each of a plurality of content of the multimodal content, creating a respective, second modality feature vector representative of content of the multimodal content having a second modality using a second machine learning model; and semantically embedding the respective, first modality feature vectors and the respective, second modality feature vectors in a common geometric space that provides logarithm-like warping of distance space in the geometric space to capture hierarchical relationships between seemingly disparate, embedded modality feature vectors of content in the geometric space; wherein embedded modality feature vectors that are related, across modalities, are closer together in the geometric space than unrelated modality feature vectors.
-
公开(公告)号:US11238631B2
公开(公告)日:2022-02-01
申请号:US16855362
申请日:2020-04-22
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Samyak Datta
Abstract: A method, apparatus and system for visual grounding of a caption in an image include projecting at least two parsed phrases of the caption into a trained semantic embedding space, projecting extracted region proposals of the image into the trained semantic embedding space, aligning the extracted region proposals and the at least two parsed phrases, aggregating the aligned region proposals and the at least two parsed phrases to determine a caption-conditioned image representation and projecting the caption-conditioned image representation and the caption into a semantic embedding space to align the caption-conditioned image representation and the caption. The method, apparatus and system can further include a parser for parsing the caption into the at least two parsed phrases and a region proposal module for extracting the region proposals from the image.
-
公开(公告)号:US20210295082A1
公开(公告)日:2021-09-23
申请号:US17337093
申请日:2021-06-02
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Ankan Bansal
Abstract: A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.
-
公开(公告)号:US20210056742A1
公开(公告)日:2021-02-25
申请号:US16855362
申请日:2020-04-22
Applicant: SRI International
Inventor: Karan Sikka , Ajay Divakaran , Samyak Datta
Abstract: A method, apparatus and system for visual grounding of a caption in an image include projecting at least two parsed phrases of the caption into a trained semantic embedding space, projecting extracted region proposals of the image into the trained semantic embedding space, aligning the extracted region proposals and the at least two parsed phrases, aggregating the aligned region proposals and the at least two parsed phrases to determine a caption-conditioned image representation and projecting the caption-conditioned image representation and the caption into a semantic embedding space to align the caption-conditioned image representation and the caption. The method, apparatus and system can further include a parser for parsing the caption into the at least two parsed phrases and a region proposal module for extracting the region proposals from the image.
-
-
-
-
-
-
-
-
-