Patent search ap:("INTEL CORPORATION") AND inv:"Christopher J. HUGHES" Page 1

1.

发明公开
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS 审中-公开

公开(公告)号：US20240126551A1

公开(公告)日：2024-04-18

申请号：US18399014

申请日：2023-12-28

Applicant: Intel Corporation

Inventor： Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30032 , G06F9/30036 , G06F9/30109

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

2.

发明申请
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSFORM MATRICES INTO ROW-INTERLEAVED FORMAT 有权

公开(公告)号：US20220357950A1

公开(公告)日：2022-11-10

申请号：US17865849

申请日：2022-07-15

Applicant: Intel Corporation

Inventor： Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

3.

发明申请
SYSTEMS AND METHODS OF INSTRUCTIONS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES USING BITMASKS THAT IDENTIFY NON-ZERO ELEMENTS 有权

公开(公告)号：US20220012305A1

公开(公告)日：2022-01-13

申请号：US17485055

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F17/16 , G06F9/38 , G06F9/30

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

4.

发明申请
SYSTEMS AND METHODS FOR PERFORMING MATRIX COMPRESS AND DECOMPRESS INSTRUCTIONS 审中-公开

公开(公告)号：US20200348937A1

公开(公告)日：2020-11-05

申请号：US16934003

申请日：2020-07-20

Applicant: Intel Corporation

Inventor： Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

5.

发明申请
SYSTEMS AND METHODS FOR IMPLEMENTING CHAINED TILE OPERATIONS 审中-公开

公开(公告)号：US20190303167A1

公开(公告)日：2019-10-03

申请号：US15942201

申请日：2018-03-30

Applicant: Intel Corporation

Inventor： Christopher J. HUGHES , Alexander F. HEINECKE , Robert Valentine , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/38 , G06F9/30

Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

6.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR DATA SPECULATION EXECUTION 审中-公开

公开(公告)号：US20190121644A1

公开(公告)日：2019-04-25

申请号：US14582859

申请日：2014-12-24

Applicant: Intel Corporation

Inventor： Elmoustapha OULD-AHMED-VALL , Christopher J. HUGHES , Robert VALENTINE , Milind B. GIRKAR

IPC: G06F9/38 , G06F9/35

Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for DSX comprises execution hardware to execute instructions to begin and end a data speculative execution (DSX) and speculative instructions during the DSX, and DSX tracking hardware to track speculative memory accesses and detect ordering violations in a DSX of speculative instructions using a sequence number, addresses of instruction accesses, and whether an instruction being tracked is a write, and to trigger a mis-speculation upon an ordering violation.

7.

发明申请
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS SPECIFYING TERNARY TILE LOGIC OPERATIONS 审中-公开

公开(公告)号：US20190042260A1

公开(公告)日：2019-02-07

申请号：US16131376

申请日：2018-09-14

Applicant: Intel Corporation

Inventor： Elmoustapha OULD-AHMED-VALL , Christopher J. HUGHES , Bret TOLL , Dan BAUM , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F9/38 , G06F17/16 , G06F9/30

Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.

8.

发明申请
COALESCING ADJACENT GATHER/SCATTER OPERATIONS 审中-公开

公开(公告)号：US20160103789A1

公开(公告)日：2016-04-14

申请号：US14975327

申请日：2015-12-18

Applicant: Intel Corporation

Inventor： Andrew T. FORSYTH , Brian J. HICKMANN , Jonathan C. HALL , Christopher J. HUGHES

IPC: G06F15/80 , G06F9/30 , G06F12/08

CPC classification number: G06F9/3853 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30105 , G06F9/30145 , G06F9/3804 , G06F9/3824 , G06F9/3836 , G06F9/3887 , G06F12/0875 , G06F12/1027 , G06F13/4282 , G06F15/8007 , G06F2212/1016 , G06F2212/452 , G06F2212/68

Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

9.

发明申请
COALESCING ADJACENT GATHER/SCATTER OPERATIONS 审中-公开

公开(公告)号：US20160103786A1

公开(公告)日：2016-04-14

申请号：US14975222

申请日：2015-12-18

Applicant: Intel Corporation

Inventor： Andrew T. FORSYTH , Brian J. HICKMANN , Jonathan C. HALL , Christopher J. HUGHES

IPC: G06F15/80 , G06F9/38 , G06F9/30

CPC classification number: G06F9/3853 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30105 , G06F9/30145 , G06F9/3804 , G06F9/3824 , G06F9/3836 , G06F9/3887 , G06F12/0875 , G06F12/1027 , G06F13/4282 , G06F15/8007 , G06F2212/1016 , G06F2212/452 , G06F2212/68

Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

10.

发明申请
APPARATUS AND METHOD FOR PARTITIONED SHUFFLES 有权

公开(公告)号：US20250103337A1

公开(公告)日：2025-03-27

申请号：US18373900

申请日：2023-09-27

Applicant: Intel Corporation

Inventor： Simon PENNYCOOK , Christopher J. HUGHES

IPC: G06F9/30

Abstract: An apparatus and method for partitioned shuffling of data elements. A first partition is associated with a first number of source data elements corresponding to a first plurality of lanes having a first plurality of lane identifiers (IDs) and a second partition is associated with a second number of source data elements corresponding to a second plurality of lanes having a second plurality of lane IDs. A bounded offset vector is generated based on allowable ranges for a plurality of offset values associated with the source data elements. An index vector is generated by permuting the first and second plurality of lane IDs in accordance with the bounded offset vector.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification