Invention Grant
- Patent Title: Method and apparatus for length-aware local tiling in a sparse attention module in a transformer
-
Application No.: US17529208Application Date: 2021-11-17
-
Publication No.: US12001510B2Publication Date: 2024-06-04
- Inventor: Zhendong Wang , Yongxiong Ren , Yang Liu , Lingzhi Liu
- Applicant: KWAI INC.
- Applicant Address: US CA Palo Alto
- Assignee: BEIJING TRANSTREAMS TECHNOLOGY CO. LTD.
- Current Assignee: BEIJING TRANSTREAMS TECHNOLOGY CO. LTD.
- Current Assignee Address: CN Beijing
- Agency: Arch & Lake LLP
- Main IPC: G06F18/2134
- IPC: G06F18/2134 ; G06F18/2431 ; G06N3/04 ; G06T1/20 ; G06V10/82

Abstract:
A method and an apparatus for length-aware local tiling in a sparse attention module in a transformer in heterogeneous devices are provided. The method includes that a heterogeneous device including one or more GPUs: divides a transformed sparsity mask into a plurality of first tiles and obtaining one or more effective first tiles from the plurality of first tiles, where each effective first tile includes at least one non-zero element; loads the one or more effective first tiles into a shared memory in the one or more GPUs and loads a plurality of elements in a first matrix corresponding to the one or more effective first tiles into the shared memory; and performs multiplication by a first sampled dense-dense matrix multiplication (SDDMM) kernel in the sparse attention module in the transformer by fetching the one or more effective first tiles and the plurality of elements from the shared memory.
Public/Granted literature
- US20230153381A1 METHOD AND APPARATUS FOR LENGTH-AWARE LOCAL TILING IN A SPARSE ATTENTION MODULE IN A TRANSFORMER Public/Granted day:2023-05-18
Information query