SINGLE-STAGE OPEN-VOCABULARY PANOPTIC SEGMENTATION

    公开(公告)号:US20250045929A1

    公开(公告)日:2025-02-06

    申请号:US18365060

    申请日:2023-08-03

    Applicant: Lemon Inc.

    Abstract: Single-stage frameworks for open-vocabulary panoptic segmentation are provided. One aspect provides a computing system comprising a processor and memory storing instructions that, when executed by the processor, cause the processor to: receive an image; extract a plurality of feature maps from the image using a convolutional neural network-based vision-language model; generate a plurality of pixel features from the plurality of feature maps; generate a plurality of mask predictions from the plurality of pixel features; generate a plurality of in-vocabulary class predictions corresponding to the plurality of mask predictions using the plurality of pixel features; generate a plurality of out-of-vocabulary class predictions using the plurality of feature maps; perform geometric ensembling on the plurality of in-vocabulary class predictions and the plurality of out-of-vocabulary class predictions to generate a plurality of final class predictions; and output the plurality of mask predictions and the plurality of final class predictions.

    SEMANTIC LABELING OF IMAGES WITH GENERATIVE LANGUAGE MODEL

    公开(公告)号:US20250157235A1

    公开(公告)日:2025-05-15

    申请号:US18509072

    申请日:2023-11-14

    Applicant: Lemon Inc.

    Abstract: A computing system including one or more processing devices configured to receive an image. The processing devices are further configured to compute a segmentation mask that identifies a region of interest included in the image. At a feature extractor, the processing devices are further configured to compute encoded image features based on the image. The processing devices are further configured to receive a text instruction. At a visual resampler, the processing devices are further configured to compute a mask query based on the segmentation mask, the encoded image features, and the text instruction. At a generative language model, the processing devices are further configured to receive a natural language query that includes the mask query and the text instruction. Based on the natural language query, at the generative language model, the processing devices are further configured to generate and output a semantic label associated with the region of interest.

    IMPLEMENTING VIDEO SEGMENTATION
    3.
    发明申请

    公开(公告)号:US20250113087A1

    公开(公告)日:2025-04-03

    申请号:US18395356

    申请日:2023-12-22

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques for implementing video segmentation. A video is divided into a plurality of clips. Each of the plurality of clips comprises several frames. Axial-trajectory attention is applied to each of the plurality of clips by a first sub-model. Clip features corresponding to each of the plurality of clips are generated by the first sub-model. A set of object queries corresponding to each of the plurality of clips is generated based on the clip features by a transformer decoder. Trajectory attention is applied to refine sets of object queries corresponding to the plurality of clips by a second sub-model. Video-level segmentation results are generated based on the refined object queries.

Patent Agency Ranking