Invention Grant
US08782115B1 Hardware architecture and scheduling for high performance and low resource solution for QR decomposition 有权
用于QR分解的高性能和低资源解决方案的硬件架构和调度

Hardware architecture and scheduling for high performance and low resource solution for QR decomposition
Abstract:
A matrix decomposition circuit is described. In one implementation, the matrix decomposition circuit includes a processing element to process a plurality of processing cells and a scheduler coupled to the processing element, where the scheduler instructs the processing element to process only required processing cells of the plurality of processing cells. In one specific implementation, the required processing cells are processing cells with non-zero inputs. Also, in one specific implementation, the matrix decomposition circuit includes an internal memory that has a rotation angles memory that stores rotation angle values calculated by the processing element, where the rotation angles memory is a first-in first-out (FIFO) memory; a systolic cell internal input values memory that stores systolic cell internal input values, where the systolic cell internal input values memory is a FIFO memory; and a systolic cell values memory that stores systolic cell values, where the systolic cell values memory is an addressable memory. In one specific implementation, where a group of Mtotal input matrices are to be decomposed to Mtotal output matrices, where Mtotal is an integer greater than one, M input matrices are fed into a decomposition circuit to decompose in parallel, where M is an integer less than or equal to Mtotal and is a minimum number required to ensure that processing element latency is hidden.
Information query
Patent Agency Ranking
0/0