-
公开(公告)号:US20180285115A1
公开(公告)日:2018-10-04
申请号:US15477064
申请日:2017-04-01
Applicant: Intel Corporation
Inventor: Niranjan K. Soundararajan , Saurabh Gupta , Sreenivas Subramoney , Rahul Pal , Ragavendra Natarajan , Daniel Deng , Jared W. Stark , Ronak Singhal , Hong Wang
CPC classification number: G06F9/46 , G06F9/3848
Abstract: Embodiments of apparatuses, methods, and systems for misprediction-triggered local history-based branch prediction are described. In one embodiments, an apparatus includes a current pattern table and a local pattern table. The current pattern table has a plurality of entries, each entry in which to store a plurality of pattern lengths of a current pattern of one of a plurality of branch instructions. The local pattern table is to provide a first branch prediction based on the current pattern.
-
公开(公告)号:US10013352B2
公开(公告)日:2018-07-03
申请号:US14498963
申请日:2014-09-26
Applicant: Intel Corporation
Inventor: Sreenivas Subramoney , Jayesh Gaur , Mukesh Agrawal , Mainak Chaudhuri
IPC: G06F12/12 , G06F12/0811 , G06F12/0813 , G06F12/0842 , G06F12/123
CPC classification number: G06F12/0811 , G06F12/0813 , G06F12/0842 , G06F12/123 , G06F2212/1024 , G06F2212/1056 , G06F2212/3042
Abstract: Embodiments described include systems, apparatuses, and methods using sectored dynamic random access memory (DRAM) cache. An exemplary apparatus may include at least one hardware processor core and a sectored dynamic random access (DRAM) cache coupled to the at least one hardware processor core.
-
公开(公告)号:US09251096B2
公开(公告)日:2016-02-02
申请号:US14036673
申请日:2013-09-25
Applicant: Intel Corporation
Inventor: Sreenivas Subramoney , Jayesh Gaur , Alaa R Alameldeen
CPC classification number: G06F12/126 , G06F12/0895
Abstract: In an embodiment, a processor includes a cache data array including a plurality of physical ways, each physical way to store a baseline way and a victim way; a cache tag array including a plurality of tag groups, each tag group associated with a particular physical way and including a first tag associated with the baseline way stored in the particular physical way, and a second tag associated with the victim way stored in the particular physical way; and cache control logic to: select a first baseline way based on a replacement policy, select a first victim way based on an available capacity of a first physical way including the first victim way, and move a first data element from the first baseline way to the first victim way. Other embodiments are described and claimed.
Abstract translation: 在一个实施例中,处理器包括高速缓存数据阵列,其包括多个物理方式,每种物理方式来存储基线方式和受害方式; 包括多个标签组的缓存标签阵列,与特定物理方式相关联的每个标签组,并且包括与以特定物理方式存储的基线方式相关联的第一标签,以及与存储在特定物理方式中的受害方式相关联的第二标签 物理方式 以及高速缓存控制逻辑,以:基于替换策略选择第一基线方式,基于包括所述第一受害者方式的第一物理方式的可用容量选择第一受害者方式,并将第一数据元素从所述第一基线方式移动到 第一个受害者的方式。 描述和要求保护其他实施例。
-
公开(公告)号:US12190114B2
公开(公告)日:2025-01-07
申请号:US17130016
申请日:2020-12-22
Applicant: Intel Corporation
IPC: G06F9/38
Abstract: In one embodiment, a processor includes a branch predictor to predict whether a branch instruction is to be taken and a branch target buffer (BTB) coupled to the branch predictor. The branch target buffer may be segmented into a first cache portion and a second cache portion, where, in response to an indication that the branch is to be taken, the BTB is to access an entry in one of the first cache portion and the second cache portion based at least in part on a type of the branch instruction, an occurrence frequency of the branch instruction, and spatial information regarding a distance between a target address of a target of the branch instruction and an address of the branch instruction. Other embodiments are described and claimed.
-
公开(公告)号:US12066945B2
公开(公告)日:2024-08-20
申请号:US17130698
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Prathmesh Kallurkar , Anant Vithal Nori , Sreenivas Subramoney
IPC: G06F12/084 , G06F9/50 , G06F12/0811 , G06F12/0846 , G06F12/0871
CPC classification number: G06F12/084 , G06F9/5016 , G06F12/0811 , G06F12/0848 , G06F12/0871
Abstract: An embodiment of an integrated circuit may comprise a core, a first level core cache memory coupled to the core, a shared core cache memory coupled to the core, a first cache controller coupled to the core and communicatively coupled to the first level core cache memory, a second cache controller coupled to the core and communicatively coupled to the shared core cache memory, and circuitry coupled to the core and communicatively coupled to the first cache controller and the second cache controller to determine if a workload has a large code footprint, and, if so determined, partition N ways of the shared core cache memory into first and second chunks of ways with the first chunk of M ways reserved for code cache lines from the workload and the second chunk of N minus M ways reserved for data cache lines from the workload, where N and M are positive integer values and N minus M is greater than zero. Other embodiments are disclosed and claimed.
-
46.
公开(公告)号:US12020033B2
公开(公告)日:2024-06-25
申请号:US17133899
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: Niranjan Kumar Soundararajan , Sreenivas Subramoney , Jayesh Gaur , S R Swamy Saranam Chongala
CPC classification number: G06F9/3836 , G06F9/223 , G06F9/3838
Abstract: Apparatus and method for memorizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memorization based on retiring micro-operations (uops) from a processing pipeline; memorization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memorization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memorization data structure.
-
公开(公告)号:US11972126B2
公开(公告)日:2024-04-30
申请号:US17472272
申请日:2021-09-10
Applicant: Intel Corporation
Inventor: David M. Durham , Michael D. LeMay , Sergej Deutsch , Joydeep Rakshit , Anant Vithal Nori , Jayesh Gaur , Sreenivas Subramoney
IPC: G06F3/06 , G06F12/02 , G06F12/1027
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/0659 , G06F3/0679 , G06F12/0238 , G06F12/1027
Abstract: Technologies disclosed herein provide one example of a system that includes processor circuitry to be communicatively coupled to a memory circuitry. The processor circuitry is to receive a memory access request corresponding to an application for access to an address range in a memory allocation of the memory circuitry and to locate a metadata region within the memory allocation. The processor circuitry is also to, in response to a determination that the address range includes at least a portion of the metadata region, obtain first metadata stored in the metadata region, use the first metadata to determine an alternate memory address in a relocation region, and read, at the alternate memory address, displaced data from the portion of the metadata region included in the address range of the memory allocation. The address range includes one or more bytes of an expected allocation region of the memory allocation.
-
48.
公开(公告)号:US11949414B2
公开(公告)日:2024-04-02
申请号:US17131215
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Gurpreet Singh Kalsi , Akshay Krishna Ramanathan , Kamlesh Pillai , Sreenivas Subramoney , Srivatsa Rangachar Srinivasa , Anirud Thyagharajan , Om Ji Omer , Saurabh Jain
IPC: H03K19/17728 , H03K19/1776
CPC classification number: H03K19/17728 , H03K19/1776
Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
-
49.
公开(公告)号:US11847053B2
公开(公告)日:2023-12-19
申请号:US16833419
申请日:2020-03-27
Applicant: INTEL CORPORATION
Inventor: Prathmesh Kallurkar , Anant Vithal Nori , Sreenivas Subramoney
IPC: G06F12/08 , G06F12/12 , G06F11/30 , G06F9/30 , G06F12/0811 , G06F12/123
CPC classification number: G06F12/0811 , G06F9/30047 , G06F11/3037 , G06F12/123 , G06F2212/1021
Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described. In one embodiment, a hardware processor includes a cache to store a plurality of cache lines of data, a processing element to execute instructions to generate memory requests, and a prefetch circuit to track a first set of cache lines, requested to be accessed by the memory requests, that repeat in a first number of executed instructions, track a second set of cache lines, requested to be accessed by the memory requests, that repeat in a second, larger number of executed instructions, detect a memory request from an instruction for a cache line from the cache, determine if the cache line is within the first set of cache lines or the second set of cache lines, update first correlation data for the cache line when the cache line is within the first set of cache lines, and update second correlation data for the cache line when the cache line is within the second set of cache lines.
-
公开(公告)号:US11783170B2
公开(公告)日:2023-10-10
申请号:US18159555
申请日:2023-01-25
Applicant: INTEL CORPORATION
Inventor: Kamlesh Pillai , Gurpreet Singh Kalsi , Sreenivas Subramoney , Prashant Laddha , Om Ji Omer
CPC classification number: G06N3/063 , G06F7/5443 , G06F18/2136 , G06F18/253 , G06N3/04 , G06T1/60 , G06V10/955 , G06V20/64
Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
-
-
-
-
-
-
-
-
-