Patent search ap:("Intel Corporation") AND inv:"Amit Gradstein" Page 8

71.

发明授权
Systems and methods to store a tile register pair to memory 有权

公开(公告)号：US11809869B2

公开(公告)日：2023-11-07

申请号：US15858937

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/30043

Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.

72.

发明授权
Systems and methods for computing dot products of nibbles in two tile operands 有权

公开(公告)号：US11789729B2

公开(公告)日：2023-10-17

申请号：US15858916

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/3001 , G06F9/3005 , G06F9/3016 , G06F9/30036 , G06F9/30043 , G06F9/30076 , G06F9/30109 , G06F9/30123 , G06F9/30145 , G06F9/383 , G06F9/3824

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).

73.

发明授权
Enabling removal and reconstruction of flag operations in a processor 有权

公开(公告)号：US11709678B2

公开(公告)日：2023-07-25

申请号：US17335284

申请日：2021-06-01

Applicant: Intel Corporation

Inventor： Zeev Sperber , Tomer Weiner , Amit Gradstein , Simon Rubanovich , Alex Gerber , Itai Ravid

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/3016 , G06F9/3001 , G06F9/30094 , G06F9/30145 , G06F9/30167 , G06F9/384 , G06F9/3861

Abstract: In one embodiment, a processor includes fetch logic to fetch instructions, decode logic to decode the fetched instructions, and execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer.

74.

发明公开
ADAPTIVE DYNAMIC DISPATCH OF MICRO-OPERATIONS 审中-公开

公开(公告)号：US20230205538A1

公开(公告)日：2023-06-29

申请号：US17561394

申请日：2021-12-23

Applicant: Intel Corporation

Inventor： Or Beit Aharon , Zeev Sperber , Gavri Berger , Amit Gradstein , Nofar Hasson

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3836 , G06F9/30145 , G06F9/3001

Abstract: Embodiments of apparatuses, methods, and systems for adaptive dynamic dispatch of micro-operations are disclosed. In an embodiment, an apparatus includes a plurality of redundant execution units, a dispatcher, control hardware, a first counter, and a second counter. The dispatcher is to dispatch micro-operations to one or more of the plurality of redundant execution units, the micro-operations having a plurality of micro-operation types. The first counter to generate a first count of dispatches, during a window, of micro-operations having a first type of the plurality of micro-operation types. The second counter to generate a second count of dispatches, during the window, of micro-operations having any type of the plurality of micro-operation types. The control hardware is to cause a switch between a first mode and a second mode based in part on the first count and the second count. In the first mode, the dispatcher is to dispatch micro-operations having the first type to only a subset of the plurality of redundant execution units. In the second mode, the dispatcher is to dispatch micro-operations having the first type to all of the plurality of redundant execution units.

75.

发明授权
Systems and methods to zero a tile register pair 有权

公开(公告)号：US11645077B2

公开(公告)日：2023-05-09

申请号：US17335377

申请日：2021-06-01

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman , Eyal Hadas

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30043

Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.

76.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS 有权

公开(公告)号：US20230048998A1

公开(公告)日：2023-02-16

申请号：US17964964

申请日：2022-10-13

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

77.

发明授权
Systems and methods to perform floating-point addition with selected rounding 有权

公开(公告)号：US11175891B2

公开(公告)日：2021-11-16

申请号：US16370966

申请日：2019-03-30

Applicant: Intel Corporation

Inventor： Simon Rubanovich , Amit Gradstein , Zeev Sperber , Mrinmay Dutta

IPC: G06F7/499 , G06F7/483 , G06F9/38 , G06F17/16

Abstract: Disclosed embodiments relate to performing floating-point addition with selected rounding. In one example, a processor includes circuitry to decode and execute an instruction specifying locations of first and second floating-point (FP) sources, and an opcode indicating the processor is to: bring the FP sources into alignment by shifting a mantissa of the smaller source FP operand to the right by a difference between their exponents, generating rounding controls based on any bits that escape; simultaneously generate a sum of the FP sources and of the FP sources plus one, the sums having a fuzzy-Jbit format having an additional Jbit into which a carry-out, if any, select one of the sums based on the rounding controls, and generate a result comprising a mantissa-wide number of most-significant bits of the selected sum, starting with the most significant non-zero Jbit.

78.

发明授权
Systems, apparatuses, and methods for fused multiply add 有权

公开(公告)号：US11169802B2

公开(公告)日：2021-11-09

申请号：US16338324

申请日：2016-10-20

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: In some embodiments, packed data elements of first and second packed data source operands are of a first, different size than a second size of packed data elements of a third packed data operand. Execution circuitry executes decoded single instruction to perform, for each packed data element position of a destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

79.

发明授权
Accelerator systems and methods for matrix operations 有权

公开(公告)号：US10942738B2

公开(公告)日：2021-03-09

申请号：US16368973

申请日：2019-03-29

Applicant: Intel Corporation

Inventor： Zeev Sperber , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz

IPC: G06F9/30 , G06F9/38

Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.

80.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS 审中-公开

公开(公告)号：US20200310802A1

公开(公告)日：2020-10-01

申请号：US16370459

申请日：2019-03-29

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification