-
61.
公开(公告)号:US20200327396A1
公开(公告)日:2020-10-15
申请号:US16913370
申请日:2020-06-26
Applicant: Intel Corporation
Inventor: Anirud Thyagharajan , Prashant Laddha , Om Omer , Sreenivas Subramoney
Abstract: Exemplary embodiments maintain spatial locality of the data being processed by a sparse CNN. The spatial locality is maintained by reordering the data to preserve spatial locality. The reordering may be performed on data elements and on data for groups of co-located data elements referred to herein as “chunks”. Thus, the data may be reordered into chunks, where each chunk contains data for spatially co-located data elements, and in addition, chunks may be organized so that spatially located chunks are together. The use of chunks helps to reduce the need to re-fetch data during processing. Chunk sizes may be chosen based on the memory constraints of the processing logic (e.g., cache sizes).
-
公开(公告)号:US10754655B2
公开(公告)日:2020-08-25
申请号:US16021838
申请日:2018-06-28
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Hong Wang , Jayesh Gaur , Zeev Sperber , Sumeet Bandishte , Lihu Rappoport , Stanislav Shwartsman , Kamil Garifullin , Sreenivas Subramoney , Adi Yoaz
Abstract: A processing device includes a branch IP table and branch predication circuitry coupled to the branch IP table. The branch predication circuitry to: determine a dynamic convergence point in a conditional branch of set of instructions; store the dynamic convergence point in the branch IP table; fetch a first and second speculative path of the conditional branch; while determining which of the first speculative path and the second speculative path is a taken path of the conditional branch and determining whether a dynamic convergence point is fetched corresponding to the stored dynamic convergence point, stall scheduling of instructions of the first speculative path and the second speculative path; and in response to determining that one of the first speculative path and the second speculative path is the taken path and the fetched dynamic convergence point corresponds to the stored convergence point, resume scheduling of the instructions of the taken path.
-
63.
公开(公告)号:US20200226203A1
公开(公告)日:2020-07-16
申请号:US16833210
申请日:2020-03-27
Applicant: Intel Corporation
Inventor: Biji George , Om Ji Omer , Dipan Kumar Mandal , Cormac Brick , Lance Hacking , Sreenivas Subramoney , Belliappa Kuttanna
IPC: G06F17/16
Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
-
公开(公告)号:US10713053B2
公开(公告)日:2020-07-14
申请号:US16024808
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Rahul Bera , Anant Vithal Nori , Sreenivas Subramoney , Hong Wang
IPC: G06F9/38 , G06F12/0875 , G06F12/0862 , G06F12/084
Abstract: An apparatus and method for adaptive spatial accelerated prefetching. For example, one embodiment of an apparatus comprises: execution circuitry to execute instructions and process data; a Level 2 (L2) cache to store at least a portion of the data; and a prefetcher to prefetch data from a memory subsystem to the L2 cache in anticipation of the data being needed by the execution unit to execute one or more of the instructions, the prefetcher comprising a buffer to store one or more prefetched memory pages or portions thereof, and signature data indicating detected patterns of access to the one or more prefetched memory pages; wherein the prefetcher is to prefetch one or more cache lines based on the signature data.
-
公开(公告)号:US10430198B2
公开(公告)日:2019-10-01
申请号:US15870595
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Saurabh Gupta , Rahul Pal , Niranjan Soundararajan , Ragavendra Natarajan , Sreenivas Subramoney
IPC: G06F9/38
Abstract: One embodiment provides an apparatus. The apparatus includes a store direct dependent (SDD) branch prediction circuitry and an SDD management circuitry. The store direct dependent (SDD) branch prediction circuitry is to store an SDD branch table. The SDD branch table is to store at least one record. Each record includes a branch instruction pointer (IP) field, a load IP field, a store IP field, a comparison info field and at least one of a store value field and/or a predicted outcome field. The SDD management circuitry is to populate the SDD branch table at runtime and to override a baseline branch prediction associated with an incoming branch IP with an SDD branch prediction, if the SDD branch table contains a first record populated with the incoming branch IP and at least one of a store value and/or an SDD predicted outcome.
-
公开(公告)号:US10423422B2
公开(公告)日:2019-09-24
申请号:US15383832
申请日:2016-12-19
Applicant: Intel Corporation
Inventor: Niranjan K. Soundararajan , Sreenivas Subramoney , Rahul Pal , Ragavendra Natarajan
IPC: G06F9/38
Abstract: A processor may include a baseline branch predictor and an empirical branch bias override circuit. The baseline branch predictor may receive a branch instruction associated with a given address identifier, and generate, based on a global branch history, an initial prediction of a branch direction for the instruction. The empirical branch bias override circuit may determine, dependent on a direction of an observed branch direction bias in executed branch instruction instances associated with the address identifier, whether the initial prediction should be overridden, may determine, in response to determining that the initial prediction should be overridden, a final prediction that matches the observed branch direction bias, or may determine, in response determining that the initial prediction should not be overridden, a final prediction that matches the initial prediction. The predictor may update an entry in the global branch history reflecting the resolved branch direction for the instruction following its execution.
-
公开(公告)号:US20190205135A1
公开(公告)日:2019-07-04
申请号:US15861370
申请日:2018-01-03
Applicant: Intel Corporation
Inventor: Anant Vithal Nori , Sreenivas Subramoney , Shankar Balachandran , Hong Wang
CPC classification number: G06F9/30047 , G06F9/3814 , G06F13/1673 , G06F2213/0064
Abstract: Implementations of the disclosure implement timely and context triggered (TACT) prefetching that targets particular load IPs in a program contributing to a threshold amount of the long latency accesses. A processing device comprising an execution unit; and a prefetcher circuit communicably coupled to the execution unit is provided. The prefetcher circuit is to detect a memory request for a target instruction pointer (IP) in a program to be executed by the execution unit. A trigger IP is identified to initiate a prefetch operation of memory data for the target IP. Thereupon, an association is determined between memory addresses of the trigger IP and the target IP. The association comprising a series of offsets representing a path between the trigger IP and an instance of the target IP in memory. Based on the association, an offset from the memory address of the trigger IP to prefetch the memory data is produced.
-
公开(公告)号:US10318834B2
公开(公告)日:2019-06-11
申请号:US15582945
申请日:2017-05-01
Applicant: INTEL CORPORATION
Inventor: Gurpreet S. Kalsi , Om J. Omer , Biji George , Gopi Neela , Dipan Kumar Mandal , Sreenivas Subramoney
Abstract: One embodiment provides an image processing circuitry. The image processing circuitry includes a feature extraction circuitry and an optimization circuitry. The feature extraction circuitry is to determine a feature descriptor based, at least in part, on a feature point location and a corresponding scale. The optimization circuitry is to optimize an operation of the feature extraction circuitry. Each optimization is to at least one of accelerate the operation of the feature extraction circuitry, reduce a power consumption of the feature extraction circuitry and/or reduce a system memory bandwidth used by the feature extraction circuitry.
-
公开(公告)号:US10191689B2
公开(公告)日:2019-01-29
申请号:US15393998
申请日:2016-12-29
Applicant: Intel Corporation
Inventor: Sriseshan Srikanth , Lavanya Subramanian , Sreenivas Subramoney
IPC: G06F3/06
Abstract: Systems for page management using local page information are disclosed. The system may include a processor, including a memory controller, and a memory, including a row buffer. The memory controller may include circuitry to determine that a page stored in the row buffer has been idle for a time exceeding a predetermined threshold determine whether the page is exempt from idle page closures, and, based on a determination that the page is exempt, refrain from closing the page. Associated methods are also disclosed.
-
70.
公开(公告)号:US20180285268A1
公开(公告)日:2018-10-04
申请号:US15475197
申请日:2017-03-31
Applicant: Intel Corporation
Inventor: Kunal Kishore Korgaonkar , Ishwar S. Bhati , Huichu Liu , Jayesh Gaur , Sasikanth Manipatruni , Sreenivas Subramoney , Tanay Karnik , Hong Wang , Ian A. Young
IPC: G06F12/0811 , G06F12/0808 , G06F12/1045 , G06F13/40
Abstract: In one embodiment, a processor comprises a processing core, a last level cache (LLC), and a mid-level cache. The mid-level cache is to determine that an idle indicator has been set, wherein the idle indicator is set based on an amount of activity at the LLC, and based on the determination that the idle indicator has been set, identify a first cache line to be evicted from a first set of cache lines of the mid-level cache and send a request to write the first cache line to the LLC.
-
-
-
-
-
-
-
-
-