Patent search ap:("Intel Corporation") AND inv:"Zeev Sperber" Page 15

141.

发明授权
Systems, methods, and apparatuses for dot product operations 有权

公开(公告)号：US11669326B2

公开(公告)日：2023-06-06

申请号：US15859271

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/30014 , G06F9/30109 , G06F9/30145 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a quadword data elements of a matrix pair. Additionally, in some instances, non-accumulating quadword data elements of the matrix pair are set to zero.

142.

发明授权
Detecting a dynamic control flow re-convergence point for conditional branches in hardware 有权

公开(公告)号：US11645078B2

公开(公告)日：2023-05-09

申请号：US16729349

申请日：2019-12-28

Applicant: Intel Corporation

Inventor： Adarsh Chauhan , Franck Sala , Jayesh Gaur , Zeev Sperber , Lihu Rappoport , Adi Yoaz , Sreenivas Subramoney

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3806 , G06F9/30058 , G06F9/30145

Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.

143.

发明授权
Systems and methods to load a tile register pair 有权

公开(公告)号：US11609762B2

公开(公告)日：2023-03-21

申请号：US17398927

申请日：2021-08-10

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30 , G06F9/38

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

144.

发明授权
Apparatuses, methods, and systems for hashing instructions 有权

公开(公告)号：US11567772B2

公开(公告)日：2023-01-31

申请号：US17537373

申请日：2021-11-29

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

145.

发明申请
INSTRUCTIONS TO CONVERT FROM FP16 TO BF8 有权

公开(公告)号：US20220206743A1

公开(公告)日：2022-06-30

申请号：US17134358

申请日：2020-12-26

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Naveen Mellempudi , Robert Valentine , Mark Charney , Christopher Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F5/00

Abstract: Techniques for converting FP16 to BF8 using bias are described. An exemplary embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand.

146.

发明授权
Apparatus and method of improved insert instructions 有权

公开(公告)号：US11347502B2

公开(公告)日：2022-05-31

申请号：US15476356

申请日：2017-03-31

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC: G06F9/30 , G06F12/06 , G06F9/38

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

147.

发明申请
SYSTEMS AND METHODS TO LOAD A TILE REGISTER PAIR 有权

公开(公告)号：US20220091848A1

公开(公告)日：2022-03-24

申请号：US17398927

申请日：2021-08-10

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

148.

发明授权
Interleaved pipeline of floating-point adders 有权

公开(公告)号：US11269630B2

公开(公告)日：2022-03-08

申请号：US16369743

申请日：2019-03-29

Applicant: Intel Corporation

Inventor： Simon Rubanovich , Amit Gradstein , Zeev Sperber

IPC: G06F9/30 , G06F9/38 , G06F7/483 , G06F17/16 , G06F7/544

Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.

149.

发明授权
Systems and methods for performing 16-bit floating-point vector dot product instructions 有权

公开(公告)号：US11263009B2

公开(公告)日：2022-03-01

申请号：US17167863

申请日：2021-02-04

Applicant: Intel Corporation

Inventor： Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Raanan Sade , Menachem Adelman , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

150.

发明授权
Systems, methods, and apparatuses for tile broadcast 有权

公开(公告)号：US11263008B2

公开(公告)日：2022-03-01

申请号：US16487774

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert Valentine , Zeev Sperber , Mark J. Charney , Bret L. Toll , Jesus Corbal , Alexander Heinecke , Barukh Ziv , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Stanislav Shwartsman

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F17/16 , G06F7/76 , G06F9/38

Abstract: Embodiments detailed herein relate to matrix operations. In particular, embodiment of broadcasting elements are described. For example, some embodiments describe broadcasting a scalar to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a row to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a column to all configured data element positons of a destination matrix (tile).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification