Patent search ap:("Intel Corporation") AND inv:"Amit Gradstein" Page 7

61.

发明申请
VECTOR MASK DRIVEN CLOCK GATING FOR POWER EFFICIENCY OF A PROCESSOR 审中-公开
Title translation: 矢量屏幕驱动时钟增益的处理器的功率效率

公开(公告)号：US20150220345A1

公开(公告)日：2015-08-06

申请号：US13997791

申请日：2012-12-19

Applicant: INTEL CORPORATION

Inventor： Jesus Corbal , Dennis R. Bradford , Jonathan C. Hall , Thomas D. Fletcher , Brian J. Hickmann , Dror Markovich , Amit Gradstein

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3836 , G06F1/3243 , G06F1/329 , G06F9/3001 , G06F9/30036 , Y02D10/152 , Y02D10/24

Abstract: A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.

Abstract translation: 处理器包括指令调度和调度（调度/调度）单元，以接收单个指令多数据（SIMD）指令，以对存储在由第一源操作数指示的存储位置中的多个数据元素执行操作。指令调度/调度单元是基于第二源操作数来确定将不被操作以生成写入目的地操作数的结果的第一数据元素。处理器还包括耦合到指令调度/调度单元的多个处理单元，以矢量方式处理SIMD指令的数据单元，以及耦合到指令调度/调度单元的功率管理单元，以减少第一所述处理元件被配置为处理所述第一数据元素。

62.

发明授权
Apparatus and method for complex matrix multiplication 有权

公开(公告)号：US12174911B2

公开(公告)日：2024-12-24

申请号：US17133473

申请日：2020-12-23

Applicant: Intel Corporation

Inventor： Menachem Adelman , Robert Valentine , Daniel Towner , Amit Gradstein , Mark Jay Charney

IPC: G06F17/16

Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix. The decoder may also decode and the execution circuitry may execute a second complex matrix multiplication instruction to multiply real and imaginary values from the first plurality with corresponding imaginary and real values, respectively, from the second plurality to generate first and second pluralities of imaginary products, and to add corresponding imaginary products to produce a corresponding imaginary value in the result matrix.

63.

发明授权
Apparatuses, methods, and systems for instructions for downconverting a tile row and interleaving with a register 有权

公开(公告)号：US12086595B2

公开(公告)日：2024-09-10

申请号：US17214853

申请日：2021-03-27

Applicant: Intel Corporation

Inventor： Menachem Adelman , Robert Valentine , Amit Gradstein , Daniel Towner , Mark Charney

IPC: G06F9/30

CPC classification number: G06F9/3016 , G06F9/30025 , G06F9/30098

Abstract: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.

64.

发明授权
Matrix transpose and multiply 有权

公开(公告)号：US11972230B2

公开(公告)日：2024-04-30

申请号：US16914318

申请日：2020-06-27

Applicant: Intel Corporation

Inventor： Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham

IPC: G06F7/78 , G06F9/30 , G06F17/16

CPC classification number: G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16

Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.

65.

发明公开
8-BIT FLOATING POINT SQUARE ROOT AND/OR RECIPROCAL SQUARE ROOT INSTRUCTIONS 审中-公开

公开(公告)号：US20240045683A1

公开(公告)日：2024-02-08

申请号：US17958371

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/3001

Abstract: Techniques for performing square root or reciprocal square root calculations on FP8 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a FP8 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.

66.

发明公开
INSTRUCTIONS TO CONVERT FROM FP16 TO FP8 审中-公开

公开(公告)号：US20240045677A1

公开(公告)日：2024-02-08

申请号：US17958378

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine

IPC: G06F9/30

CPC classification number: G06F9/30025 , G06F9/3016

Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

67.

发明公开
8-BIT FLOATING POINT SOURCE ARITHMETIC INSTRUCTIONS 审中-公开

公开(公告)号：US20240045654A1

公开(公告)日：2024-02-08

申请号：US17958373

申请日：2022-10-01

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber

IPC: G06F7/483

CPC classification number: G06F7/483

Abstract: Techniques for performing arithmetic operations on FP8 values are described. An exemplary instruction includes fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of location of a packed data destination operand, wherein the opcode is to indicate an arithmetic operation execution circuitry is to perform, for each data element position of the identified packed data source operands, the arithmetic operation on FP8 data elements in that data element position in FP8 format and store a result of each arithmetic operation into a corresponding data element position of the identified packed data destination operand.

68.

发明授权
Systems and methods for performing 16-bit floating-point matrix dot product instructions 有权

公开(公告)号：US11893389B2

公开(公告)日：2024-02-06

申请号：US18190761

申请日：2023-03-27

Applicant: Intel Corporation

Inventor： Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Raanan Sade , Menachem Adelman , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/3016 , G06F9/3802

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

69.

发明公开
DEVICE, METHOD AND SYSTEM FOR EXECUTING A TILE LOAD AND EXPAND INSTRUCTION 审中-公开

公开(公告)号：US20230409326A1

公开(公告)日：2023-12-21

申请号：US17841558

申请日：2022-06-15

Applicant: Intel Corporation

Inventor： Menachem Adelman , Amit Gradstein , Simon Rubanovich , Barukh Ziv , Uri Sherman , Dana Rip , Shahar Mizrahi , Dan Baum , Rinat Rappoport , Nilesh Jain , Zeev Sperber , Gideon Stupp , Alexander Heinecke , Christopher Hughes , Evangelos Georganas

IPC: G06F9/30 , G06F9/38 , G06N3/04

CPC classification number: G06F9/30145 , G06F9/30178 , G06F9/30047 , G06F9/3887 , G06N3/04

Abstract: Techniques and mechanisms for processor circuitry to execute a load and expand instruction of an instruction set to generate decompressed matrix data. In an embodiment, the instruction comprises a source operand which indicates a location from which compressed matrix data, and corresponding metadata, are to be accessed. A destination operand of the instruction indicates a location which is to receive decompressed metadata, which is generated, during execution of the instruction, based on the compressed matrix data and the corresponding metadata. The metadata comprises compression mask information which identifies which elements of the matrix have been masked from the compressed matrix data. In another embodiment, the instruction further comprises a count operand which identifies a total number of the unmasked matrix elements which are represented in the compressed matrix data.

70.

发明授权
Systems, methods, and apparatuses for matrix operations 有权

公开(公告)号：US11816483B2

公开(公告)日：2023-11-14

申请号：US15859268

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/30036 , G06F9/30101 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification