Processor architecture for executing transfers between wide operand memories
    82.
    发明申请
    Processor architecture for executing transfers between wide operand memories 审中-公开
    用于执行广泛操作数存储器之间传输的处理器架构

    公开(公告)号:US20090089540A1

    公开(公告)日:2009-04-02

    申请号:US11982202

    申请日:2007-10-31

    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

    Abstract translation: 一种可编程处理器和方法,用于通过将至少两个源操作数或源和结果操作数扩展到大于通用寄存器或数据路径宽度的宽度的宽度来提高处理器的性能。 本发明通过使用通用寄存器的内容来指定可以读取或写入数据的多个数据路径宽度的存储器地址,并且基本上大于处理器的数据路径宽度的操作数,以及 操作数的大小和形状。 此外,描述了用于实现这些指令的几个指令和装置,其如果操作数不限于通用寄存器的宽度和可访问数量,则获得性能优点。

    System and method to implement a matrix multiply unit of a broadband processor
    83.
    发明授权
    System and method to implement a matrix multiply unit of a broadband processor 失效
    实现宽带处理器的矩阵乘法单元的系统和方法

    公开(公告)号:US07483935B2

    公开(公告)日:2009-01-27

    申请号:US10233779

    申请日:2002-09-04

    Abstract: The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.

    Abstract translation: 本发明提供了一种用于通过实现一个功能单元来提高通用处理器的性能的系统和方法,所述功能单元使用向量操作数来计算矩阵操作数的乘积,产生向量结果。 功能单元完全利用128b乘128b乘法器的全部资源,无论操作数大小如何,因为矩阵和向量操作数的元素数量随着操作数大小的减小而增加。 该单元通过适度的资源执行具有最高可能的中间精度的定点和浮点乘法和补充。

    Method and Apparatus for Programmable Processor
    85.
    发明申请
    Method and Apparatus for Programmable Processor 审中-公开
    可编程处理器的方法和装置

    公开(公告)号:US20080072020A1

    公开(公告)日:2008-03-20

    申请号:US11841964

    申请日:2007-08-20

    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.

    Abstract translation: 提出了一种系统和装置,其涉及包括执行单元的可编程处理器,该执行单元可操作以解码和执行从指令路径接收的指令,并将存储在寄存器堆中的寄存器中的数据分割成多个数据元素,该执行单元能够执行多个 不同的组浮点和组整数算术运算,每个算术运算在寄存器文件中的多个数据元素存储的寄存器上产生返回到寄存器文件中的寄存器的连接结果,其中,连接结果包括多个单独的结果 其中,所述执行单元能够执行响应于数据处理指令以不同方式重新布置数据元素的组数据处理操作。

Patent Agency Ranking