Method and apparatus for length-aware local tiling in a sparse attention module in a transformer

Invention Grant

US12001510B2 Method and apparatus for length-aware local tiling in a sparse attention module in a transformer 有权

Please log in to see more content

Patent Title: Method and apparatus for length-aware local tiling in a sparse attention module in a transformer
Application No.: US17529208

Application Date: 2021-11-17
Publication No.: US12001510B2

Publication Date: 2024-06-04
Inventor: Zhendong Wang , Yongxiong Ren , Yang Liu , Lingzhi Liu
Applicant: KWAI INC.
Applicant Address: US CA Palo Alto
Assignee: BEIJING TRANSTREAMS TECHNOLOGY CO. LTD.
Current Assignee: BEIJING TRANSTREAMS TECHNOLOGY CO. LTD.
Current Assignee Address: CN Beijing
Agency: Arch & Lake LLP
Main IPC: G06F18/2134
IPC: G06F18/2134 ; G06F18/2431 ; G06N3/04 ; G06T1/20 ; G06V10/82

Method and apparatus for length-aware local tiling in a sparse attention module in a transformer

Abstract:

A method and an apparatus for length-aware local tiling in a sparse attention module in a transformer in heterogeneous devices are provided. The method includes that a heterogeneous device including one or more GPUs: divides a transformed sparsity mask into a plurality of first tiles and obtaining one or more effective first tiles from the plurality of first tiles, where each effective first tile includes at least one non-zero element; loads the one or more effective first tiles into a shared memory in the one or more GPUs and loads a plurality of elements in a first matrix corresponding to the one or more effective first tiles into the shared memory; and performs multiplication by a first sampled dense-dense matrix multiplication (SDDMM) kernel in the sparse attention module in the transformer by fetching the one or more effective first tiles and the plurality of elements from the shared memory.

Public/Granted literature

US20230153381A1 METHOD AND APPARATUS FOR LENGTH-AWARE LOCAL TILING IN A SPARSE ATTENTION MODULE IN A TRANSFORMER Public/Granted day:2023-05-18

Information query

Espacenet