VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR
    61.
    发明申请
    VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR 审中-公开
    矢量屏幕驱动时钟增益的处理器的功率效率

    公开(公告)号:US20150220345A1

    公开(公告)日:2015-08-06

    申请号:US13997791

    申请日:2012-12-19

    Abstract: A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.

    Abstract translation: 处理器包括指令调度和调度(调度/调度)单元,以接收单个指令多数据(SIMD)指令,以对存储在由第一源操作数指示的存储位置中的多个数据元素执行操作。 指令调度/调度单元是基于第二源操作数来确定将不被操作以生成写入目的地操作数的结果的第一数据元素。 处理器还包括耦合到指令调度/调度单元的多个处理单元,以矢量方式处理SIMD指令的数据单元,以及耦合到指令调度/调度单元的功率管理单元,以减少第一 所述处理元件被配置为处理所述第一数据元素。

    Apparatus and method for complex matrix multiplication

    公开(公告)号:US12174911B2

    公开(公告)日:2024-12-24

    申请号:US17133473

    申请日:2020-12-23

    Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix. The decoder may also decode and the execution circuitry may execute a second complex matrix multiplication instruction to multiply real and imaginary values from the first plurality with corresponding imaginary and real values, respectively, from the second plurality to generate first and second pluralities of imaginary products, and to add corresponding imaginary products to produce a corresponding imaginary value in the result matrix.

    Apparatuses, methods, and systems for instructions for downconverting a tile row and interleaving with a register

    公开(公告)号:US12086595B2

    公开(公告)日:2024-09-10

    申请号:US17214853

    申请日:2021-03-27

    CPC classification number: G06F9/3016 G06F9/30025 G06F9/30098

    Abstract: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    66.
    发明公开

    公开(公告)号:US20240045677A1

    公开(公告)日:2024-02-08

    申请号:US17958378

    申请日:2022-10-01

    CPC classification number: G06F9/30025 G06F9/3016

    Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

Patent Agency Ranking