Invention Grant
US08782115B1 Hardware architecture and scheduling for high performance and low resource solution for QR decomposition
有权
用于QR分解的高性能和低资源解决方案的硬件架构和调度
- Patent Title: Hardware architecture and scheduling for high performance and low resource solution for QR decomposition
- Patent Title (中): 用于QR分解的高性能和低资源解决方案的硬件架构和调度
-
Application No.: US12148385Application Date: 2008-04-18
-
Publication No.: US08782115B1Publication Date: 2014-07-15
- Inventor: Kulwinder Dhanoa
- Applicant: Kulwinder Dhanoa
- Applicant Address: US CA San Jose
- Assignee: Altera Corporation
- Current Assignee: Altera Corporation
- Current Assignee Address: US CA San Jose
- Agency: Mauriel Kapouytian Woods LLP
- Agent Avarat Kapouytian
- Main IPC: G06F7/32
- IPC: G06F7/32 ; G06F7/38 ; H04K1/10 ; H04L27/28 ; H04B7/02 ; H04L1/02

Abstract:
A matrix decomposition circuit is described. In one implementation, the matrix decomposition circuit includes a processing element to process a plurality of processing cells and a scheduler coupled to the processing element, where the scheduler instructs the processing element to process only required processing cells of the plurality of processing cells. In one specific implementation, the required processing cells are processing cells with non-zero inputs. Also, in one specific implementation, the matrix decomposition circuit includes an internal memory that has a rotation angles memory that stores rotation angle values calculated by the processing element, where the rotation angles memory is a first-in first-out (FIFO) memory; a systolic cell internal input values memory that stores systolic cell internal input values, where the systolic cell internal input values memory is a FIFO memory; and a systolic cell values memory that stores systolic cell values, where the systolic cell values memory is an addressable memory. In one specific implementation, where a group of Mtotal input matrices are to be decomposed to Mtotal output matrices, where Mtotal is an integer greater than one, M input matrices are fed into a decomposition circuit to decompose in parallel, where M is an integer less than or equal to Mtotal and is a minimum number required to ensure that processing element latency is hidden.
Information query