Invention Grant
- Patent Title: Efficient matrix multiplication on a parallel processing device
- Patent Title (中): 在并行处理设备上有效的矩阵乘法
-
Application No.: US11454411Application Date: 2006-06-16
-
Publication No.: US07792895B1Publication Date: 2010-09-07
- Inventor: Norbert Juffa , Radoslav Danilak
- Applicant: Norbert Juffa , Radoslav Danilak
- Applicant Address: US CA Santa Clara
- Assignee: NVIDIA Corporation
- Current Assignee: NVIDIA Corporation
- Current Assignee Address: US CA Santa Clara
- Agency: Patterson & Sheridan, LLP
- Main IPC: G06F7/52
- IPC: G06F7/52

Abstract:
The present invention enables efficient matrix multiplication operations on parallel processing devices. One embodiment is a method for mapping CTAs to result matrix tiles for matrix multiplication operations. Another embodiment is a second method for mapping CTAs to result tiles. Yet other embodiments are methods for mapping the individual threads of a CTA to the elements of a tile for result tile computations, source tile copy operations, and source tile copy and transpose operations. The present invention advantageously enables result matrix elements to be computed on a tile-by-tile basis using multiple CTAs executing concurrently on different streaming multiprocessors, enables source tiles to be copied to local memory to reduce the number accesses from the global memory when computing a result tile, and enables coalesced read operations from the global memory as well as write operations to the local memory without bank conflicts.
Information query