Multi-scale transformer for image analysis

    公开(公告)号:US12217382B2

    公开(公告)日:2025-02-04

    申请号:US18527528

    申请日:2023-12-04

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

    Multi-scale Transformer for Image Analysis
    2.
    发明公开

    公开(公告)号:US20240119555A1

    公开(公告)日:2024-04-11

    申请号:US18527528

    申请日:2023-12-04

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

    Learnable cost volume for determining pixel correspondence

    公开(公告)号:US11790550B2

    公开(公告)日:2023-10-17

    申请号:US17292647

    申请日:2020-07-08

    Applicant: Google LLC

    CPC classification number: G06T7/593 G06T7/215 G06T2207/10012 G06T2207/20081

    Abstract: A method includes obtaining a first plurality of feature vectors associated with a first image and a second plurality of feature vectors associated with a second image. The method also includes generating a plurality of transformed feature vectors by transforming each respective feature vector of the first plurality of feature vectors by a kernel matrix trained to define an elliptical inner product space. The method additionally includes generating a cost volume by determining, for each respective transformed feature vector of the plurality of transformed feature vectors, a plurality of inner products, wherein each respective inner product of the plurality of inner products is between the respective transformed feature vector and a corresponding candidate feature vector of a corresponding subset of the second plurality of feature vectors. The method further includes determining, based on the cost volume, a pixel correspondence between the first image and the second image.

    Multi-scale Transformer for Image Analysis

    公开(公告)号:US20250124537A1

    公开(公告)日:2025-04-17

    申请号:US18999336

    申请日:2024-12-23

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

    MULTI-SCALE TRANSFORMER FOR IMAGE ANALYSIS
    8.
    发明公开

    公开(公告)号:US20230222623A1

    公开(公告)日:2023-07-13

    申请号:US17787699

    申请日:2021-07-01

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

    Multi-scale transformer for image analysis

    公开(公告)号:US11887270B2

    公开(公告)日:2024-01-30

    申请号:US17787699

    申请日:2021-07-01

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

Patent Agency Ranking