Patent search ap:("INTEL CORPORATION") AND inv:"Christopher J. Hughes" Page 1

1.

发明授权
Adaptive remote atomics 有权

公开(公告)号：US12216579B2

公开(公告)日：2025-02-04

申请号：US17134254

申请日：2020-12-25

Applicant: Intel Corporation

Inventor： Carl J. Beckmann , Samantika S. Sury , Christopher J. Hughes , Lingxiang Xiang , Rahul Agrawal

IPC: G06F12/0811 , G06F12/0817 , G06F12/084 , G06F12/0862

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.

2.

发明授权
Inter-cluster shared data management in sub-NUMA cluster 有权

公开(公告)号：US12210446B2

公开(公告)日：2025-01-28

申请号：US18284265

申请日：2021-06-21

Applicant: Intel Corporation

Inventor： Zhe Wang , Lingxiang Xiang , Christopher J. Hughes

IPC: G06F12/00 , G06F9/30 , G06F12/02 , G06F12/0811

Abstract: An embodiment of an integrated circuit may comprise circuitry communicatively coupled to two or more sub-non-uniform memory access clusters (SNCs) to allocate a specified memory space in the two or more SNCs in accordance with a SNC memory allocation policy indicated from a request to initialize the specified memory space. An embodiment of an apparatus may comprise decode circuitry to decode a single instruction, the single instruction to include a field for an opcode, and execution circuitry to execute the decoded instruction according to the opcode to provide an indicated SNC memory allocation policy (e.g., a SNC policy hint). Other embodiments are disclosed and claimed.

3.

发明公开
SYSTEM, METHOD AND APPARATUS FOR CONDITIONALLY OFFLOADING INSTRUCTION EXECUTION 审中-公开

公开(公告)号：US20240354107A1

公开(公告)日：2024-10-24

申请号：US18754447

申请日：2024-06-26

Applicant: Intel Corporation

Inventor： Frank Hady , Christopher J. Hughes , Scott Peterson

IPC: G06F9/30 , G06F9/32 , G06F9/38

CPC classification number: G06F9/30047 , G06F9/321 , G06F9/3836

Abstract: In one example, a processor includes: at least one core to execute instructions; and at least one cache memory coupled to the at least one core, the at least one cache memory to store data, at least some of the data a copy of data stored in a memory. The at least one core is to determine whether to conditionally offload a sequence of instructions for execution on a compute circuit associated with the memory, based at least in part on whether one or more first data is present in the at least one cache memory, the one or more first data for use during execution of the sequence of instructions. Other embodiments are described and claimed.

4.

发明授权
Processor instructions for data compression and decompression 有权

公开(公告)号：US12106104B2

公开(公告)日：2024-10-01

申请号：US17133328

申请日：2020-12-23

Applicant: Intel Corporation

Inventor： Zhe Wang , Alaa R. Alameldeen , Christopher J. Hughes

IPC: G06F9/30 , G06F12/0862 , H03M7/30

CPC classification number: G06F9/30047 , G06F9/30145 , G06F12/0862 , H03M7/30 , G06F2212/602

Abstract: A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.

5.

发明授权
Matrix transpose and multiply 有权

公开(公告)号：US11972230B2

公开(公告)日：2024-04-30

申请号：US16914318

申请日：2020-06-27

Applicant: Intel Corporation

Inventor： Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham

IPC: G06F7/78 , G06F9/30 , G06F17/16

CPC classification number: G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16

Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.

6.

发明授权
No-locality hint vector memory access processors, methods, systems, and instructions 有权

公开(公告)号：US11892952B2

公开(公告)日：2024-02-06

申请号：US17867673

申请日：2022-07-18

Applicant: Intel Corporation

Inventor： Christopher J. Hughes

IPC: G06F12/0877 , G06F9/30 , G06F12/0862 , G06F12/0811 , G06F15/80 , G06F12/0897

CPC classification number: G06F12/0877 , G06F9/30 , G06F9/30036 , G06F12/0811 , G06F12/0862 , G06F12/0897 , G06F15/8069 , G06F2212/1016 , G06F2212/1024 , G06F2212/27 , G06F2212/283 , G06F2212/6028

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

7.

发明授权
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements 有权

公开(公告)号：US11847185B2

公开(公告)日：2023-12-19

申请号：US17485055

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F17/16 , G06F9/38 , G06F9/30

CPC classification number: G06F17/16 , G06F9/3001 , G06F9/3016 , G06F9/30101 , G06F9/3802

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

8.

发明授权
Systems and methods for performing instructions to transform matrices into row-interleaved format 有权

公开(公告)号：US11675590B2

公开(公告)日：2023-06-13

申请号：US17865849

申请日：2022-07-15

Applicant: Intel Corporation

Inventor： Raanan Sade , Robert Valentine , Bret Toll , Christopher J. Hughes , Alexander F. Heinecke , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney

IPC: G06F12/128 , G06T1/00 , G06F9/30

CPC classification number: G06F9/30167 , G06F9/30101 , G06F9/30149

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

9.

发明授权
Processor and method implementing a cacheline demote machine instruction 有权

公开(公告)号：US11513957B2

公开(公告)日：2022-11-29

申请号：US17027248

申请日：2020-09-21

Applicant: Intel Corporation

Inventor： Ren Wang , Andrew J. Herdrich , Yen-cheng Liu , Herbert H. Hum , Jong Soo Park , Christopher J. Hughes , Namakkal N. Venkatesan , Adrian C. Moga , Aamer Jaleel , Zeshan A. Chishti , Mesut A. Ergin , Jr-shian Tsai , Alexander W. Min , Tsung-yuan C. Tai , Christian Maciocco , Rajesh Sankaran

IPC: G06F12/0842 , G06F12/0893 , G06F12/109 , G06F12/0813 , G06F12/0831 , G06F9/455

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

10.

发明申请
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE 有权

公开(公告)号：US20220237123A1

公开(公告)日：2022-07-28

申请号：US17712632

申请日：2022-04-04

Applicant: Intel Corporation

Inventor： Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Ronak Singhal , Seyed Yahya Sotoudeh , Bret L. Toll , Lihu Rappoport , David B. Papworth , James D. Allen

IPC: G06F12/0831 , G06F12/1027 , G06F12/1009 , G06F9/30

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification