Patent search ap:("INTEL CORPORATION") AND inv:"Sriram Aananthakrishnan" Page 1

1.

发明授权
Cache support for indirect loads and indirect stores in graph applications 有权

公开(公告)号：US12204901B2

公开(公告)日：2025-01-21

申请号：US17359305

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Sriram Aananthakrishnan , Jason Howard , Joshua Fryman

IPC: G06F9/30 , G06F9/38 , G06F12/0875

Abstract: Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.

2.

发明授权
Array broadcast and reduction systems and methods 有权

公开(公告)号：US10983793B2

公开(公告)日：2021-04-20

申请号：US16369846

申请日：2019-03-29

Applicant: INTEL CORPORATION

Inventor： Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith

IPC: G06F9/30 , G06F13/28 , G06F9/32 , G06F9/455

Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

3.

发明授权
Memory system architecture for multi-threaded processors 有权

公开(公告)号：US11630691B2

公开(公告)日：2023-04-18

申请号：US17410818

申请日：2021-08-24

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Jason M. Howard , Joshua B. Fryman , Tina C. Zhong , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cave , Sriram Aananthakrishnan , Bharadwaj Krishnamurthy

IPC: G06F9/30 , G06F9/35 , G06F9/48 , G06F12/0815 , G06F9/38 , G06F13/28

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

4.

发明公开
INSTRUCTION SET ARCHITECTURE AND HARDWARE SUPPORT FOR HASH OPERATIONS 审中-公开

公开(公告)号：US20240241645A1

公开(公告)日：2024-07-18

申请号：US18621437

申请日：2024-03-29

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Shruti Sharma , Fabio Checconi , Sriram Aananthakrishnan , Jesmin Jahan Tithi , Jordi Wolfson-Pou , Joshua B. Fryman

IPC: G06F3/06

CPC classification number: G06F3/0613 , G06F3/0656 , G06F3/0673

Abstract: Systems, apparatuses and methods may provide for technology that includes a plurality of hash management buffers corresponding to a plurality of pipelines, wherein each hash management buffer in the plurality of hash management buffers is adjacent to a pipeline in the plurality of pipelines, and wherein a first hash management buffer is to issue one or more hash packets associated with one or more hash operations on a hash table. The technology may also include a plurality of hash engines corresponding to a plurality of dynamic random access memories (DRAMs), wherein each hash engine in the plurality of hash engines is adjacent to a DRAM in the plurality of DRAMs, and wherein one or more of the hash engines is to initialize a target memory destination associated with the hash table and conduct the one or more hash operations in response to the one or more hash packets.

5.

发明授权
Memory system architecture for multi-threaded processors 有权

公开(公告)号：US11106494B2

公开(公告)日：2021-08-31

申请号：US16147302

申请日：2018-09-28

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Jason M. Howard , Joshua B. Fryman , Tina C. Zhong , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cave , Sriram Aananthakrishnan , Bharadwaj Krishnamurthy

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F12/0815 , G06F9/35 , G06F13/28

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

6.

发明授权
System, apparatus and method for barrier synchronization in a multi-threaded processor 有权

公开(公告)号：US11061742B2

公开(公告)日：2021-07-13

申请号：US16019685

申请日：2018-06-27

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cavé , Sriram Aananthakrishnan , Jason M. Howard , Joshua B. Fryman

IPC: G06F9/52 , G06F9/30 , G06F9/38

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

7.

发明申请
SYSTEM, APPARATUS AND METHOD FOR BARRIER SYNCHRONIZATION IN A MULTI-THREADED PROCESSOR 审中-公开

公开(公告)号：US20200004602A1

公开(公告)日：2020-01-02

申请号：US16019685

申请日：2018-06-27

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cavé , Sriram Aananthakrishnan , Jason M. Howard , Joshua B. Fryman

IPC: G06F9/52 , G06F9/38 , G06F9/30

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

8.

发明公开
MULTI-DIMENSIONAL NETWORK SORTED ARRAY MERGING 审中-公开

公开(公告)号：US20240045829A1

公开(公告)日：2024-02-08

申请号：US18131143

申请日：2023-04-05

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Sriram Aananthakrishnan

IPC: G06F15/173

CPC classification number: G06F15/17375

Abstract: Techniques for multi-dimensional network sorted array merging. A first switch of a plurality of switches of an apparatus may receive a first element of a first array and a first element of a second array. The first switch may determine that the first element of the first array is less than the first element of the second array. The first switch may cause the first element of the first array to be stored as a first element of an output array.

9.

发明申请
CACHE SUPPORT FOR INDIRECT LOADS AND INDIRECT STORES IN GRAPH APPLICATIONS 有权

公开(公告)号：US20220413855A1

公开(公告)日：2022-12-29

申请号：US17359305

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Sriram Aananthakrishnan , Jason Howard , Joshua Fryman

IPC: G06F9/30 , G06F9/38 , G06F12/0875

Abstract: Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.

10.

发明申请
LARGE-SCALE MATRIX RESTRUCTURING AND MATRIX-SCALAR OPERATIONS 有权

公开(公告)号：US20220100508A1

公开(公告)日：2022-03-31

申请号：US17134251

申请日：2020-12-25

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Vincent Cave , Sriram Aananthakrishnan , Jason M. Howard , Joshua B. Fryman

IPC: G06F9/30 , G06F9/38

Abstract: Embodiments of apparatuses and methods for copying and operating on matrix elements are described. In embodiments, an apparatus includes a hardware instruction decoder to decode a single instruction and execution circuitry, coupled to hardware instruction decoder, to perform one or more operations corresponding to the single instruction. The single instruction has a first operand to reference a base address of a first representation of a source matrix and a second operand to reference a base address of second representation of a destination matrix. The one or more operations include copying elements of the source matrix to corresponding locations in the destination matrix and filling empty elements of the destination matrix with a single value.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification