-
公开(公告)号:US10223112B2
公开(公告)日:2019-03-05
申请号:US15721799
申请日:2017-09-30
Applicant: Intel Corporation
Inventor: Seth Abraham , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Zeev Sperber , Amit Gradstein
Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20190042255A1
公开(公告)日:2019-02-07
申请号:US15858937
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
-
公开(公告)号:US20190042254A1
公开(公告)日:2019-02-07
申请号:US15858932
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
-
公开(公告)号:US20190042235A1
公开(公告)日:2019-02-07
申请号:US15858916
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
-
公开(公告)号:US10156884B2
公开(公告)日:2018-12-18
申请号:US15354018
申请日:2016-11-17
Applicant: INTEL CORPORATION
Inventor: Michael Mishaeli , Ron Gabor , Robert C. Valentine , Alex Gerber , Zeev Sperber
Abstract: Technologies for local power gate (LPG) interfaces for power-aware operations are described. A system on chip (SoC) includes a first functional unit, a second functional unit, and local power gate (LPG) hardware coupled to the first functional unit and the second functional unit. The LPG hardware is to power gate the first functional unit according to local power states of the LPG hardware. The second functional unit decodes a first instruction to perform a first power-aware operation of a specified length, including computing an execution code path for execution. The second functional unit monitors a current local power state of the LPG hardware, selects a code path based on the current local power state, the specified length, and a specified threshold, and issues a hint to the LPG hardware to power up the first functional unit and continues execution of the first power-aware operation without waiting for the first functional unit to be powered up.
-
公开(公告)号:US09996320B2
公开(公告)日:2018-06-12
申请号:US14757942
申请日:2015-12-23
Applicant: Intel Corporation
Inventor: Cristina S. Anderson , Marius A. Cornea-Hasegan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Nikita Astafev , Mark J. Charney , Milind B. Girkar , Amit Gradstein , Simon Rubanovich , Zeev Sperber
CPC classification number: G06F7/4876 , G06F7/485 , G06F7/49915
Abstract: An example processor includes a register and a fused multiply-add (FMA) low functional unit. The register stores first, second, and third floating point (FP) values. The FMA low functional unit receives a request to perform an FMA low operation: multiplies the first FP value with the second FP value to obtain a first product value; adds the first product with the third FP value to generate a first result value; rounds the first result to generate a first FMA value; multiplies the first FP value with the second FP value to obtain a second product value; adds the second product value with the third FP value to generate a second result value; and subtracts the FMA value from the second result value to obtain a third result value, which can then be normalized and rounded (FMA low result) and sent the FMA low result to an application.
-
公开(公告)号:US20180129266A1
公开(公告)日:2018-05-10
申请号:US15849838
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: Nadav Bonen , Ron Gabor , Zeev Sperber , Vjekoslav Svilan , David N. Mackintosh , Jose A. Baiocchi Paredes , Naveen Kumar , Shantanu Gupta
CPC classification number: G06F1/3243 , G06F1/3287 , G06F9/30083 , G06F9/3869 , G06F9/3885 , Y02B70/123 , Y02B70/126 , Y02D10/152 , Y02D10/171
Abstract: In an embodiment, the present invention includes an execution unit to execute instructions of a first type, a local power gate circuit coupled to the execution unit to power gate the execution unit while a second execution unit is to execute instructions of a second type, and a controller coupled to the local power gate circuit to cause it to power gate the execution unit when an instruction stream does not include the first type of instructions. Other embodiments are described and claimed.
-
公开(公告)号:US20180113711A1
公开(公告)日:2018-04-26
申请号:US15849711
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Benny Eitan , Doron Orenstein
Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
-
公开(公告)号:US20160103785A1
公开(公告)日:2016-04-14
申请号:US14881111
申请日:2015-10-12
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir
CPC classification number: G06F15/8007 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3887
Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
-
公开(公告)号:US09268565B2
公开(公告)日:2016-02-23
申请号:US14684412
申请日:2015-04-12
Applicant: Intel Corporation
Inventor: Rajiv Kapoor , Ronen Zohar , Mark Buxton , Zeev Sperber , Koby Gottlieb
CPC classification number: G06F9/30021 , G06F7/026 , G06F9/3001 , G06F9/30029 , G06F9/30036 , G06F9/30058 , G06F9/30094 , G06F9/30098 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/3887 , G06F12/0875 , G06F2212/452
Abstract: A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.
-
-
-
-
-
-
-
-
-