Patent search ap:("Intel Corporation") AND inv:"Zeev Sperber" Page 11

101.

发明授权
Efficient implementation of complex vector fused multiply add and complex vector multiply 有权

公开(公告)号：US11455167B2

公开(公告)日：2022-09-27

申请号：US16701082

申请日：2019-12-02

Applicant: Intel Corporation

Inventor： Raanan Sade , Thierry Pons , Amit Gradstein , Zeev Sperber , Mark J. Charney , Robert Valentine , Eyal Oz-Sinay

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

102.

发明申请
Technology For Dynamically Tuning Processor Features 有权

公开(公告)号：US20220206925A1

公开(公告)日：2022-06-30

申请号：US17582051

申请日：2022-01-24

Applicant: Intel Corporation

Inventor： Adarsh Chauhan , Jayesh Gaur , Franck Sala , Lihu Rappoport , Zeev Sperber , Adi Yoaz , Sreenivas Subramoney

IPC: G06F11/34 , G06F9/24 , G06F9/38 , G06F11/30 , G06F15/78

Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.

103.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS 有权

公开(公告)号：US20220206801A1

公开(公告)日：2022-06-30

申请号：US17134373

申请日：2020-12-26

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F7/499 , G06F9/38

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

104.

发明授权
Enabling removal and reconstruction of flag operations in a processor 有权

公开(公告)号：US11036509B2

公开(公告)日：2021-06-15

申请号：US14930848

申请日：2015-11-03

Applicant: Intel Corporation

Inventor： Zeev Sperber , Tomer Weiner , Amit Gradstein , Simon Rubanovich , Alex Gerber , Itai Ravid

IPC: G06F9/30

Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.

105.

发明授权
Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator 有权

公开(公告)号：US10990397B2

公开(公告)日：2021-04-27

申请号：US16370894

申请日：2019-03-30

Applicant: Intel Corporation

Inventor： Amit Gradstein , Simon Rubanovich , Sagi Meller , Zeev Sperber , Jose Yallouz , Robert Valentine

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.

106.

发明授权
Systems and methods for performing 16-bit floating-point matrix dot product instructions 有权

公开(公告)号：US10963246B2

公开(公告)日：2021-03-30

申请号：US16186387

申请日：2018-11-09

Applicant: Intel Corporation

Inventor： Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Raanan Sade , Menachem Adelman , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

107.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS 有权

公开(公告)号：US20210049013A1

公开(公告)日：2021-02-18

申请号：US17087536

申请日：2020-11-02

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

108.

发明授权
Technology for dynamically tuning processor features 有权

公开(公告)号：US10915421B1

公开(公告)日：2021-02-09

申请号：US16575535

申请日：2019-09-19

Applicant: Intel Corporation

Inventor： Adarsh Chauhan , Jayesh Gaur , Franck Sala , Lihu Rappoport , Zeev Sperber , Adi Yoaz , Sreenivas Subramoney

IPC: G06F11/34 , G06F15/78 , G06F11/30 , G06F9/38 , G06F9/24

Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.

109.

发明授权
In-lane vector shuffle instructions 有权

公开(公告)号：US10831477B2

公开(公告)日：2020-11-10

申请号：US15849715

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： Zeev Sperber , Robert Valentine , Benny Eitan , Doron Orenstein

IPC: G06F9/30 , G06F9/38 , G06F9/315

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

110.

发明授权
Efficient implementation of complex vector fused multiply add and complex vector multiply 有权

公开(公告)号：US10521226B2

公开(公告)日：2019-12-31

申请号：US15941531

申请日：2018-03-30

Applicant: Intel Corporation

Inventor： Raanan Sade , Thierry Pons , Amit Gradstein , Zeev Sperber , Mark J. Charney , Robert Valentine , Eyal Oz-Sinay

IPC: G06F9/30 , G06F17/16 , G06F9/38

Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification