-
公开(公告)号:US10514912B2
公开(公告)日:2019-12-24
申请号:US16133269
申请日:2018-09-17
Applicant: Intel Corporation
Inventor: Shay Gueron , Vlad Krasnov , Robert Valentine , Zeev Sperber , Amit Gradstein , Simon Rubanovich
Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.
-
公开(公告)号:US10474459B2
公开(公告)日:2019-11-12
申请号:US15808788
申请日:2017-11-09
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein
IPC: G06F9/30
Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
-
公开(公告)号:US10324857B2
公开(公告)日:2019-06-18
申请号:US15416549
申请日:2017-01-26
Applicant: Intel Corporation
Inventor: Joseph Nuzman , Raanan Sade , Igor Yanover , Ron Gabor , Amit Gradstein
IPC: G06F12/10 , G06F12/1036 , G06F12/1027
Abstract: A processing device including a linear address transformation circuit to determine that a metadata value stored in a portion of a linear address falls within a pre-defined metadata range. The metadata value corresponds to a plurality of metadata bits. The linear address transformation circuit to replace each of the plurality of the metadata bits with a constant value.
-
公开(公告)号:US10324718B2
公开(公告)日:2019-06-18
申请号:US15864158
申请日:2018-01-08
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal San Andrian , Suleyman Sair , Bret L. Toll , Zeev Sperber , Amit Gradstein , Asaf Rubinstein
IPC: G06F9/30
Abstract: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US10223112B2
公开(公告)日:2019-03-05
申请号:US15721799
申请日:2017-09-30
Applicant: Intel Corporation
Inventor: Seth Abraham , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Zeev Sperber , Amit Gradstein
Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20190042255A1
公开(公告)日:2019-02-07
申请号:US15858937
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
-
公开(公告)号:US20190042254A1
公开(公告)日:2019-02-07
申请号:US15858932
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
-
公开(公告)号:US20190042235A1
公开(公告)日:2019-02-07
申请号:US15858916
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
-
29.
公开(公告)号:US10073695B2
公开(公告)日:2018-09-11
申请号:US15366320
申请日:2016-12-01
Applicant: Intel Corporation
Inventor: Cristina S. Anderson , Bret L. Toll , Robert Valentine , Simon Rubanovich , Amit Gradstein
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/49947 , G06F9/30014 , G06F9/30025 , G06F9/30036 , G06F9/30109 , G06F9/3013
Abstract: A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20180210842A1
公开(公告)日:2018-07-26
申请号:US15416549
申请日:2017-01-26
Applicant: Intel Corporation
Inventor: Joseph Nuzman , Raanan Sade , Igor Yanover , Ron Gabor , Amit Gradstein
IPC: G06F12/1036
CPC classification number: G06F12/1036 , G06F12/1027 , G06F2212/1016 , G06F2212/657 , G06F2212/683 , G06F2212/684
Abstract: A processing device including a linear address transformation circuit to determine that a metadata value stored in a portion of a linear address falls within a pre-defined metadata range. The metadata value corresponds to a plurality of metadata bits. The linear address transformation circuit to replace each of the plurality of the metadata bits with a constant value.
-
-
-
-
-
-
-
-
-