GATHER USING INDEX ARRAY AND FINITE STATE MACHINE

    公开(公告)号:US20170192934A1

    公开(公告)日:2017-07-06

    申请号:US14616323

    申请日:2015-02-06

    Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

    Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a same set of per-lane control bits

    公开(公告)号:US09672034B2

    公开(公告)日:2017-06-06

    申请号:US13838048

    申请日:2013-03-15

    CPC classification number: G06F9/30032 G06F9/30036 G06F9/3885 G06F9/3887

    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY
    124.
    发明申请
    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY 审中-公开
    方法,装置,说明和逻辑提供向量包装的十字形跨比较功能

    公开(公告)号:US20160188336A1

    公开(公告)日:2016-06-30

    申请号:US14588247

    申请日:2014-12-31

    CPC classification number: G06F9/30036 G06F9/30018 G06F9/30021 G06F9/3834

    Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.

    Abstract translation: 指令和逻辑提供SIMD向量填充元组交叉比较功能。 一些处理器实施例包括具有可变多个数据字段的第一和第二寄存器,每个数据字段用于存储第一数据类型的元素。 在一些实施例中,处理器执行用于向量填充元组交叉比较的SIMD指令,对于第一寄存器的元组中的数据字段的一部分的每个数据字段,将其相应元素与数据字段的相应部分的每个元素进行比较 在第二寄存器的元组中,根据相应的比较,在对应于相应的第一寄存器部分的每个未屏蔽元素的位掩码中设置对应于第二寄存器部分的每个元素的掩码位。 在一些实施例中,位掩码由第三寄存器的数据字段中的相应元素移位。 比较类型由即时操作数指示。

    THREAD PAUSE PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
    126.
    发明申请
    THREAD PAUSE PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 审中-公开
    线程暂停处理器,方法,系统和指令

    公开(公告)号:US20160019063A1

    公开(公告)日:2016-01-21

    申请号:US14336596

    申请日:2014-07-21

    CPC classification number: G06F9/3851 G06F9/30 G06F9/30058 G06F9/3009

    Abstract: A processor of an aspect includes a decode unit to decode a thread pause instruction from a first thread. A back-end portion of the processor is coupled with the decode unit. The back-end portion of the processor, in response to the thread pause instruction, is to pause processing of subsequent instructions of the first thread for execution. The subsequent instructions occur after the thread pause instruction in program order. The back-end portion, in response to the thread pause instruction, is also to keep at least a majority of the back-end portion of the processor, empty of instructions of the first thread, except for the thread pause instruction, for a predetermined period of time. The majority may include a plurality of execution units and an instruction queue unit.

    Abstract translation: 一个方面的处理器包括解码单元,用于对来自第一线程的线程暂停指令进行解码。 处理器的后端部分与解码单元耦合。 响应于线程暂停指令,处理器的后端部分是暂停用于执行的第一线程的后续指令的处理。 随后的指令以程序顺序发生在线程暂停指令之后。 响应于线程暂停指令,后端部分还将保持处理器的后端部分的至少大部分,除了线程暂停指令之外的第一线程的指令,预定的 一段的时间。 大多数可以包括多个执行单元和指令队列单元。

    SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE
    128.
    发明申请
    SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE 有权
    散射器使用索引阵列和有限状态机

    公开(公告)号:US20150074373A1

    公开(公告)日:2015-03-12

    申请号:US13977727

    申请日:2012-06-02

    Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

    Abstract translation: 公开了使用索引阵列和有限状态机进行散射/收集操作的方法和装置。 设备的实施例可以包括:解码逻辑以解码散射/收集指令并产生微操作。 索引数组保存一组索引和一组对应的掩码元素。 有限状态机有助于散射操作。 地址生成逻辑从针对具有第一值的对应掩模元素中的至少每一个的索引集合的索引生成地址。 正在生成的每组地址的缓冲区中分配存储空间。 与生成的地址集相对应的数据元素被复制到缓冲器。 如果对应的掩码元素具有所述第一值并且掩模元素被响应于它们各自的存储的完成而被改变为第二值,则访问该集合的地址以存储数据元素。

    SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS

    公开(公告)号:US20250004763A1

    公开(公告)日:2025-01-02

    申请号:US18886639

    申请日:2024-09-16

    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

Patent Agency Ranking