Patent search ap:("Intel Corporation") AND inv:"Amit Gradstein" Page 2

11.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS 有权

公开(公告)号：US20220147356A1

公开(公告)日：2022-05-12

申请号：US17537373

申请日：2021-11-29

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

12.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR MOVING DATA BETWEEN TILES OF A MATRIX OPERATIONS ACCELERATOR AND VECTOR REGISTERS 有权

公开(公告)号：US20210406018A1

公开(公告)日：2021-12-30

申请号：US16914347

申请日：2020-06-27

Applicant: INTEL CORPORATION

Inventor： Menachem Adelman , Robert Valentine , Barukh Ziv , Yaroslav Pollak , Gideon Stupp , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark Charney , Christopher Hughes , Alexander Heinecke

IPC: G06F9/30 , G06F17/16

Abstract: Systems, methods, and apparatuses relating to one or more instructions that utilize direct paths for loading data into a tile from a vector register and/or storing data from a tile into a vector register are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a plurality of registers that represents a two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a coupling to a cache; and a hardware processor core comprising: a vector register, a decoder to decode a single instruction into a decoded single instruction, the single instruction including a first field that identifies the two-dimensional matrix, a second field that identifies a set of elements of the two-dimensional matrix, and a third field that identifies the vector register, and an execution circuit to execute the decoded single instruction to cause a store of the set of elements from the plurality of registers that represents the two-dimensional matrix into the vector register by a coupling of the hardware processor core to the matrix operations accelerator circuit that is separate from the coupling to the cache.

13.

发明申请
LOADING AND STORING MATRIX DATA WITH DATATYPE CONVERSION 有权

公开(公告)号：US20210406012A1

公开(公告)日：2021-12-30

申请号：US16914317

申请日：2020-06-27

Applicant: Intel Corporation

Inventor： Menachem Adelman , Robert Valentine , Gideon Stupp , Yaroslav Pollak , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas

IPC: G06F9/30 , G06F17/16

Abstract: Embodiments for loading and storing matrix data with datatype conversion are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a first destination matrix location, and a first source operand field to specify a first source matrix location. The execution circuitry is to, in response to the decoded instruction, convert data elements from a plurality of source element locations of a first source matrix specified by the first source matrix location from a first datatype to a second datatype to generate a plurality of converted data elements and to store each of the plurality of converted data elements in one of a plurality of destination element locations in a first destination matrix specified by the first destination matrix location.

14.

发明申请
Systems, Apparatuses, And Methods For Fused Multiply Add 有权

公开(公告)号：US20210406011A1

公开(公告)日：2021-12-30

申请号：US17468258

申请日：2021-09-07

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

15.

发明授权
Apparatuses, methods, and systems for hashing instructions 有权

公开(公告)号：US11188335B2

公开(公告)日：2021-11-30

申请号：US17087536

申请日：2020-11-02

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

16.

发明申请
ENABLING REMOVAL AND RECONSTRUCTION OF FLAG OPERATIONS IN A PROCESSOR 有权

公开(公告)号：US20210357216A1

公开(公告)日：2021-11-18

申请号：US17335284

申请日：2021-06-01

Applicant: Intel Corporation

Inventor： Zeev Sperber , Tomer Weiner , Amit Gradstein , Simon Rubanovich , Alex Gerber , Itai Ravid

IPC: G06F9/30

Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.

17.

发明授权
Efficient rotate adder for implementing cryptographic basic operations 有权

公开(公告)号：US11176278B2

公开(公告)日：2021-11-16

申请号：US16236450

申请日：2018-12-29

Applicant: Intel Corporation

Inventor： Amit Gradstein , Simon Rubanovich , Regev Shemy , Onkar P Desai , Jose Yallouz

IPC: G06F21/72 , H04L9/06

Abstract: Integrated circuits to compute a result of summing m values, rotating the sum by k bits, and adding a summation of n values Bi to Bn to the rotated sum. An embodiment includes: a first carry save adder to add up the m values to generate a first carry and a first sum; rotator circuitry to rotate both the first carry and the first sum by k bits to generate a second carry and a second sum; a second carry save adder to add up the second carry, the second sum, and the summation of values Bi to Bn to generate a third carry and a third sum; two parallel adders to generate a first intermediate result and a second intermediary result based on the third carry and the third sum; and a multiplexer to generate the result utilizing various portions of the first and second intermediate results.

18.

发明申请
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO GENERATE SEQUENCES OF INTEGERS IN NUMERICAL ORDER THAT DIFFER BY A CONSTANT STRIDE 有权

公开(公告)号：US20210263743A1

公开(公告)日：2021-08-26

申请号：US17121740

申请日：2020-12-14

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Seth Abraham , Robert Valentine , Zeev Sperber , Amit Gradstein

IPC: G06F9/30

Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four non-negative integers in numerical order with all integers in consecutive positions differing by a constant stride of at least two. In an aspect, storing the result including the sequence of the at least four integers is performed without calculating the at least four integers using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.

19.

发明授权
Systems and methods to load a tile register pair 有权

公开(公告)号：US11093247B2

公开(公告)日：2021-08-17

申请号：US15858932

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F15/00 , G06F9/30

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

20.

发明授权
Systems and methods for performing instructions to convert to 16-bit floating-point format 有权

公开(公告)号：US11068262B2

公开(公告)日：2021-07-20

申请号：US17133078

申请日：2020-12-23

Applicant: Intel Corporation

Inventor： Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Raanan Sade , Menachem Adelman , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification