INSTRUCTION-GUIDED VISUAL EMBEDDINGS AND FEEDBACK-BASED LEARNING IN LARGE VISION-LANGUAGE MODELS

    公开(公告)号:US20250131027A1

    公开(公告)日:2025-04-24

    申请号:US18924763

    申请日:2024-10-23

    Abstract: In an example, a method for fine-tuning a Large Visual Language Model (LVLM) includes providing visual queries, each of the visual queries comprises at least an image and a textual query related to the image; processing, by the LVLM, the visual queries to extract visual embeddings from the visual queries, wherein the LVLM comprises a Visual Language Model (VLM), a first Large Language Model (LLM), and a linear projection layer interconnecting the VLM and the LLM; for visual queries: i) generating, by the LVLM, a response to the corresponding visual query based on the corresponding visual embedding; ii) evaluating, by a second LLM, the generated response to verify that the generated response satisfies predefined criteria; and iii) providing, by the second LLM, a feedback to the LVLM, in response to the evaluating the generated response; and fine-tuning the LVLM using aggregated feedback provided by the second LLM for the visual queries.

    CAUSAL ANALYSIS WITH TIME SERIES DATA

    公开(公告)号:US20250110989A1

    公开(公告)日:2025-04-03

    申请号:US18895080

    申请日:2024-09-24

    Abstract: In general, various aspects of the techniques are directed to causal analysis using large scale time series data. A computing system may convert large scale time series data to first time period records and second time period records according to a multi-scale time resolution. The computing system may implement a hierarchical machine learning model to generate embeddings that capture temporal characteristics of features of the large scale time series data. The computing system may generate a graph data structure indicating cause and effect correlations between features of the large scale time series data based on temporal dynamics captured in the cause and second time period records and/or the embeddings.

    SPATIAL-TEMPORAL ANOMALY AND EVENT DETECTION USING NIGHT VISION SENSORS

    公开(公告)号:US20240212350A1

    公开(公告)日:2024-06-27

    申请号:US18331007

    申请日:2023-06-07

    CPC classification number: G06V20/44 G06V10/44 H04N23/21

    Abstract: In general, the disclosure describes techniques for joint spatiotemporal Artificial Intelligence (AI) models that can encompass multiple space and time resolutions through self-supervised learning. In an example, a method includes for each of a plurality of multimodal data, generating, by a computing system, using a first machine learning model, a respective modality feature vector representative of content of the multimodal data, wherein each of the generated modality feature vectors has a different modality; processing, by the computing system, each of generated modality feature vectors with a second machine learning model comprising an encoder model to generate event data comprising a plurality of events and/or activities of interest; and analyzing, by the computing system, the event data to generate anomaly data indicative of detected anomalies in the multimodal data.

    ZERO-SHOT OBJECT DETECTION
    14.
    发明申请

    公开(公告)号:US20190325243A1

    公开(公告)日:2019-10-24

    申请号:US16383447

    申请日:2019-04-12

    Abstract: A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.

    Method for pose invariant fingerprinting
    18.
    发明授权
    Method for pose invariant fingerprinting 有权
    姿态不变指纹识别方法

    公开(公告)号:US08860813B2

    公开(公告)日:2014-10-14

    申请号:US13711220

    申请日:2012-12-11

    CPC classification number: G06K9/00771 G06K9/6206 G06K9/6211

    Abstract: A computer-implemented method for matching objects is disclosed. At least two images where one of the at least two images has a first target object and a second of the at least two images has a second target object are received. At least one first patch from the first target object and at least one second patch from the second target object are extracted. A distance-based part encoding between each of the at least one first patch and the at least one second patch based upon a corresponding codebook of image parts including at least one of part type and pose is constructed. A viewpoint of one of the at least one first patch is warped to a viewpoint of the at least one second patch. A parts level similarity measure based on the view-invarient distance measure for each of the at least one first patch and the at least one second patch is applied to determine whether the first target object and the second target object are the same or different objects.

    Abstract translation: 公开了一种用于匹配对象的计算机实现的方法。 接收至少两个图像,其中至少两个图像中的一个具有第一目标对象,并且至少两个图像中的第二图像具有第二目标对象。 提取来自第一目标对象的至少一个第一补丁和来自第二目标对象的至少一个第二补丁。 构建基于包括部件类型和姿态中的至少一个的图像部件的对应码本的至少一个第一贴片和至少一个第二贴片中的每一个之间的基于距离的部件编码。 所述至少一个第一贴片中的一个的视点弯曲到所述至少一个第二贴片的观点。 应用基于对于至少一个第一贴片和至少一个第二贴片中的每一个的视野不变距离度量的零件级相似性度量来确定第一目标对象和第二目标对象是相同还是不同的对象。

    LARGE LANGUAGE MODEL AUGMENTATION WITH KNOWLEDGE LANGUAGE MODELS

    公开(公告)号:US20250131212A1

    公开(公告)日:2025-04-24

    申请号:US18919630

    申请日:2024-10-18

    Abstract: In an example, a method for generating responses by a Machine Learning (ML) system includes processing, by a first language model, a natural language instruction to generate an instruction representation based on a meaning of the natural language instruction; translating, by a translation module comprising an interface between the first language model and a second language model, the instruction representation into data indicating an intent of the natural language instruction, wherein the second language model is trained with domain specific knowledge; providing, by the translation module, the natural language instruction and the data indicating the intent of the natural language instruction to the second language model; and generating, by the second language model, a response based on the natural language instruction and the data indicating the intent of the natural language instruction.

Patent Agency Ranking