SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS

    公开(公告)号:US20230048998A1

    公开(公告)日:2023-02-16

    申请号:US17964964

    申请日:2022-10-13

    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

    Architectural register replacement for instructions that use multiple architectural registers

    公开(公告)号:US10255072B2

    公开(公告)日:2019-04-09

    申请号:US15201310

    申请日:2016-07-01

    Abstract: A processor of an aspect includes a decode unit to decode an instruction. The instruction is to explicitly specify a first architectural register and is to implicitly indicate at least a second architectural register. The second architectural register is implicitly to be at a higher register number than the first architectural register. The processor also includes an architectural register replacement unit coupled with the decode unit. The architectural register replacement unit is to replace the first architectural register with a third architectural register, and is to replace the second architectural register with a fourth architectural register. The third architectural register is to be at a lower register number than the first architectural register. The fourth architectural register is to be at a lower register number than the second architectural register. Other processors are also disclosed, as are methods and systems.

    GENERATING VECTOR BASED SELECTION CONTROL STATEMENTS

    公开(公告)号:US20180181404A1

    公开(公告)日:2018-06-28

    申请号:US15391915

    申请日:2016-12-28

    CPC classification number: G06F9/3844 G06F9/30058 G06F9/3806 G06F15/76

    Abstract: In one example, a system for generating vector based selection control statements can include a processor to determine a vector cost of the selection control statement is below a scalar cost and determine the selection control statement is to be executed in a sorted order based on dependencies between branch instructions of the selection control statement. The processor can also determine a program ordering of labels of the selection control statement does not match a mathematical ordering of the labels and execute the selection control statement with a vector of values, wherein the selection control statement is to be executed based on a jump table and a sorted unique value technique, wherein the sorted unique value technique comprises selecting at least one of the plurality of branch instructions from the jump table.

    Vector address conflict resolution with vector population count functionality
    9.
    发明授权
    Vector address conflict resolution with vector population count functionality 有权
    矢量地址冲突解决与矢量人口计数功能

    公开(公告)号:US09411592B2

    公开(公告)日:2016-08-09

    申请号:US13731005

    申请日:2012-12-29

    Abstract: Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.

    Abstract translation: 指令和逻辑提供SIMD地址冲突解决与向量群体计数功能。 一些实施例包括具有可变多个数据字段的寄存器的处理器,每个数据字段用于存储可变的第二多个位。 目的地寄存器具有对应的数据字段,这些数据字段中的每一个用于存储为相应的数据字段设置为1的位数的计数。 响应于对向量群体计数指令进行解码,执行单元对寄存器中的每个数据字段设置为1的位数进行计数,并将计数存储在第一目的地寄存器的相应数据字段中。 矢量人口计数指令可用于可变大小的元素和冲突掩码,以生成迭代计数和完成掩码,以便在每次迭代中使用以解决聚集修改散射SIMD操作中的依赖关系。

    SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS

    公开(公告)号:US20250004763A1

    公开(公告)日:2025-01-02

    申请号:US18886639

    申请日:2024-09-16

    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

Patent Agency Ranking