Patent search ap:("Intel Corporation") AND inv:"Sreenivas Subramoney" Page 6

51.

发明授权
System, method, and apparatus for enhanced pointer identification and prefetching 有权

公开(公告)号：US11693780B2

公开(公告)日：2023-07-04

申请号：US17391962

申请日：2021-08-02

Applicant: Intel Corporation

Inventor： Sreenivas Subramoney , Stanislav Shwartsman , Anant Nori , Shankar Balachandran , Elad Shtiegmann , Vineeth Mekkat , Manjunath Shevgoor , Sourabh Alurkar

IPC: G06F12/0862

CPC classification number: G06F12/0862 , G06F2212/602

Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.

52.

发明公开
DE-PRIORITIZING SPECULATIVE CODE LINES IN ON-CHIP CACHES 审中-公开

公开(公告)号：US20230185718A1

公开(公告)日：2023-06-15

申请号：US17551172

申请日：2021-12-14

Applicant: Intel Corporation

Inventor： Anant Vithal Nori , Prathmesh Kallurkar , Niranjan Kumar Soundararajan , Sreenivas Subramoney , Lihu Rappoport , Hanna Alam , Adrian Moga , Ronak Singhal

IPC: G06F12/084

CPC classification number: G06F12/084 , G06F2212/62

Abstract: Methods and apparatus relating to de-prioritizing speculative code lines in on-chip caches are described. In an embodiment, logic circuitry determines whether a storage structure includes a reference to a code miss request prior to transmission of the code miss request to a shared cache. The logic circuitry causes de-prioritization of a code line, corresponding to the code miss request, in the shared cache in response to an absence of the reference in the storage structure. Other embodiments are also disclosed and claimed.

53.

发明授权
Spatially sparse neural network accelerator for multi-dimension visual analytics 有权

公开(公告)号：US11620818B2

公开(公告)日：2023-04-04

申请号：US17131121

申请日：2020-12-22

Applicant: Intel Corporation

Inventor： Kamlesh Pillai , Gurpreet Singh Kalsi , Sreenivas Subramoney , Prashant Laddha , Om Ji Omer

IPC: G06V10/94 , G06F7/544 , G06N3/04 , G06T1/60 , G06K9/62 , G06V20/64

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

54.

发明申请
SPECULATIVE DECOMPRESSION WITHIN PROCESSOR CORE CACHES 有权

公开(公告)号：US20220197643A1

公开(公告)日：2022-06-23

申请号：US17133618

申请日：2020-12-23

Applicant: Intel Corporation

Inventor： Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali

IPC: G06F9/30 , G06F12/0875

Abstract: Methods and apparatus relating to speculative decompression within processor core caches are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into a plurality of cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the plurality of cachelines of the cache of the processor core in response to the second micro operation. The decompression instruction causes the DE circuitry to perform an out-of-order decompression of the plurality of cachelines. Other embodiments are also disclosed and claimed.

55.

发明授权
Instruction set architecture based and automatic load tracking for opportunistic re-steer of data-dependent flaky branches 有权

公开(公告)号：US11321089B2

公开(公告)日：2022-05-03

申请号：US16914338

申请日：2020-06-27

Applicant: Intel Corporation

Inventor： Saurabh Gupta , Niranjan Soundararajan , Ragavendra Natarajan , Sreenivas Subramoney

IPC: G06F9/30 , G06F9/38

Abstract: Methods and apparatuses relating to instruction set architecture (ISA) based and automatic load tracking hardware for opportunistic re-steer of data-dependent flaky branches are described. In one embodiment, a processor includes a pipeline circuit comprising a decoder to decode instructions into decoded instructions and an execution circuit to execute the decoded instructions, a branch predictor circuit to generate a predicted path for a branch instruction, and a branch re-steer circuit to, for the branch instruction dependent on a result from a load instruction, check if an instruction received by the pipeline circuit is the load instruction, and when the instruction received by the pipeline circuit is the load instruction, check for a write back of the result from the load instruction between a decode of the branch instruction with the decoder and an execution of the branch instruction with the execution circuit, and when the predicted path differs from a path based on the result from the load instruction, re-steer the branch instruction in the pipeline circuit to the path and cause execution of the branch instruction for the path based on the result from the load instruction.

56.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR A DUPLICATION RESISTANT ON-DIE IRREGULAR DATA PREFETCHER 有权

公开(公告)号：US20210303468A1

公开(公告)日：2021-09-30

申请号：US16833419

申请日：2020-03-27

Applicant: INTEL CORPORATION

Inventor： Prathmesh Kallurkar , Anant Vithal Nori , Sreenivas Subramoney

IPC: G06F12/0811 , G06F12/123 , G06F9/30 , G06F11/30

Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described. In one embodiment, a hardware processor includes a cache to store a plurality of cache lines of data, a processing element to execute instructions to generate memory requests, and a prefetch circuit to track a first set of cache lines, requested to be accessed by the memory requests, that repeat in a first number of executed instructions, track a second set of cache lines, requested to be accessed by the memory requests, that repeat in a second, larger number of executed instructions, detect a memory request from an instruction for a cache line from the cache, determine if the cache line is within the first set of cache lines or the second set of cache lines, update first correlation data for the cache line when the cache line is within the first set of cache lines, and update second correlation data for the cache line when the cache line is within the second set of cache lines.

57.

发明申请
Techniques for Accelerating Neural Networks 有权

公开(公告)号：US20210166114A1

公开(公告)日：2021-06-03

申请号：US17172627

申请日：2021-02-10

Applicant: Intel Corporation

Inventor： Gurpreet S Kalsi , Ramachandra Chakenalli Nanjegowda , Kamlesh R Pillai , Sreenivas Subramoney

IPC: G06N3/063 , G06N3/04 , G06F17/16

Abstract: Embodiments are generally directed to techniques for accelerating neural networks. Many embodiments include a hardware accelerator for a bi-directional multi-layered GRU and LC neural network. Some embodiments are particularly directed to a hardware accelerator that enables offloading of the entire LC+GRU network to the hardware accelerator. Various embodiments include a hardware accelerator with a plurality of matrix vector units to perform GRU steps in parallel with LC steps. For example, at least a portion of computation by a first matrix vector unit of a GRU step in a neural network may overlap at least a portion of computation by a second matrix vector unit of an output feature vector for the neural network. Several embodiments include overlapping computation associated with a layer of a neural network with data transfer associated with another of the neural network.

58.

发明申请
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO IMPROVE IN-MEMORY MULTIPLY AND ACCUMULATE OPERATIONS 有权

公开(公告)号：US20210111722A1

公开(公告)日：2021-04-15

申请号：US17131215

申请日：2020-12-22

Applicant: Intel Corporation

Inventor： Gurpreet Singh Kalsi , Akshay Krishna Ramanathan , Kamlesh Pillai , Sreenivas Subramoney , Srivatsa Rangachar Srinivasa , Anirud Thyagharajan , Om Ji Omer , Saurabh Jain

IPC: H03K19/17728 , H03K19/1776

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

59.

发明申请
TILE-BASED SPARSITY AWARE DATAFLOW OPTIMIZATION FOR SPARSE DATA 有权

公开(公告)号：US20210090328A1

公开(公告)日：2021-03-25

申请号：US17114315

申请日：2020-12-07

Applicant: Intel Corporation

Inventor： Prashant Laddha , Anirud Thyagharajan , Om Ji Omer , Sreenivas Subramoney

IPC: G06T17/05 , G06T7/70 , G06N3/04 , G06F16/31

Abstract: Systems, apparatuses and methods provide technology for optimizing processing of sparse data, such as 3D pointcloud data sets. The technology may include generating a locality-aware rulebook based on an input unstructured sparse data set, such as a 3D pointcloud data set, the locality-aware rulebook storing spatial neighborhood information for active voxels in the input unstructured sparse data set, computing an average receptive field (ARF) value based on the locality aware rulebook, and determining, from a plurality of tile size and loop order combinations, a tile size and loop order combination for processing the unstructured sparse data based on the computed ARF value. The technology may also include providing the locality-aware rulebook and the tile size and loop order combination to a compute engine such as a neural network, the compute engine to process the unstructured sparse data using the locality aware rulebook and the tile size and loop order combination.

60.

发明申请
PAGE ALLOCATION FOR CONTIGUITY-AWARE TRANSLATION LOOKASIDE BUFFERS 有权

公开(公告)号：US20210089467A1

公开(公告)日：2021-03-25

申请号：US17113801

申请日：2020-12-07

Applicant: Intel Corporation

Inventor： Aravinda Prasad , Sreenivas Subramoney

IPC: G06F12/1009 , G06F12/0808 , G06F12/126 , G06F9/50

Abstract: Systems, apparatuses and methods may provide for technology that allocates a physical page for a virtual memory address associated with a fault, determines a size and layout of an address space containing the virtual memory address, and conducts a soft reservation of a set of contiguous physical memory pages based on the size and the layout of the address space.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification