-
公开(公告)号:US20220197821A1
公开(公告)日:2022-06-23
申请号:US17133414
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Wim Heirman , Ibrahim Hur
IPC: G06F12/1027 , G06F12/0862 , G06F12/0891 , G06F9/30
Abstract: Techniques and mechanisms for providing information to determine whether a software prefetch instruction is to be executed. In an embodiment, one or more entries of a translation lookaside buffer (TLB) each include a respective value which indicates whether, according to one or more criteria, corresponding data has been sufficiently utilized. Insufficiently utilized data is indicated in a TLB entry with an identifier of an executed instruction to prefetch the corresponding data. An eviction of the TLB entry results in the creation of an entry in a registry of prefetch instructions. The entry in the registry includes the identifier of the executed prefetch instruction, and a value indicating a number of times that one or more future prefetch instructions are to be dropped. In another embodiment, execution of a subsequent prefetch instruction—which also corresponds to the identifier—is prevented based on the registry entry.
-
12.
公开(公告)号:US10942851B2
公开(公告)日:2021-03-09
申请号:US16203891
申请日:2018-11-29
Applicant: Intel Corporation
Inventor: Wim Heirman , Stijn Eyerman , Kristof Du Bois , Ibrahim Hur , Joshua B. Fryman
IPC: G06F12/08 , G06F12/0804
Abstract: In one embodiment, an apparatus includes a memory access circuit to receive memory access instructions and provide at least some of the memory access instructions to a memory subsystem for execution. The memory access circuit may have a conversion circuit to convert the first memory access instruction to a first subline memory access instruction, e.g., based at least in part on an access history for a first memory access instruction. Other embodiments are described and claimed.
-
13.
公开(公告)号:US20200233806A1
公开(公告)日:2020-07-23
申请号:US16837833
申请日:2020-04-01
Applicant: Intel Corporation
Inventor: Wim Heirman , Ibrahim Hur , Ugonna Echeruo , Stijn Eyerman , Kristof Du Bois
IPC: G06F12/0862
Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
-
14.
公开(公告)号:US20200004684A1
公开(公告)日:2020-01-02
申请号:US16024527
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: Wim Heirman , Ibrahim Hur , Ugonna Echeruo , Stijn Eyerman , Kristof Du Bois
IPC: G06F12/0862
Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
-
公开(公告)号:US12111772B2
公开(公告)日:2024-10-08
申请号:US17133414
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Wim Heirman , Ibrahim Hur
IPC: G06F12/1027 , G06F9/30 , G06F12/0862 , G06F12/0891
CPC classification number: G06F12/1027 , G06F9/30047 , G06F12/0862 , G06F12/0891 , G06F2212/6024
Abstract: Techniques and mechanisms for providing information to determine whether a software prefetch instruction is to be executed. In an embodiment, one or more entries of a translation lookaside buffer (TLB) each include a respective value which indicates whether, according to one or more criteria, corresponding data has been sufficiently utilized. Insufficiently utilized data is indicated in a TLB entry with an identifier of an executed instruction to prefetch the corresponding data. An eviction of the TLB entry results in the creation of an entry in a registry of prefetch instructions. The entry in the registry includes the identifier of the executed prefetch instruction, and a value indicating a number of times that one or more future prefetch instructions are to be dropped. In another embodiment, execution of a subsequent prefetch instruction—which also corresponds to the identifier—is prevented based on the registry entry.
-
公开(公告)号:US12050915B2
公开(公告)日:2024-07-30
申请号:US17130592
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Wim Heirman , Stijn Eyerman , Ibrahim Hur
IPC: G06F9/30 , G06F9/38 , G06F11/34 , G06F12/0811 , G06F12/0862
CPC classification number: G06F9/3802 , G06F9/30047 , G06F9/3818 , G06F11/3409 , G06F12/0811 , G06F12/0862 , G06F9/383 , G06F2212/452
Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
-
公开(公告)号:US11960922B2
公开(公告)日:2024-04-16
申请号:US17030999
申请日:2020-09-24
Applicant: Intel Corporation
Inventor: Joshua B. Fryman , Jason M. Howard , Ibrahim Hur , Robert Pawlowski
IPC: G06F9/46 , G06F9/30 , G06F9/38 , G06Q10/101
CPC classification number: G06F9/466 , G06F9/3004 , G06F9/30043 , G06F9/3834 , G06F2212/452 , G06Q10/101
Abstract: In an embodiment, a processor comprises: an execution circuit to execute instructions; at least one cache memory coupled to the execution circuit; and a table storage element coupled to the at least one cache memory, the table storage element to store a plurality of entries each to store object metadata of an object used in a code sequence. The processor is to use the object metadata to provide user space multi-object transactional atomic operation of the code sequence. Other embodiments are described and claimed.
-
公开(公告)号:US20220283719A1
公开(公告)日:2022-09-08
申请号:US17824413
申请日:2022-05-25
Applicant: Intel Corporation
Inventor: Stijn Eyerman , Wim Heirman , Ibrahim Hur
IPC: G06F3/06
Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
-
公开(公告)号:US20220222397A1
公开(公告)日:2022-07-14
申请号:US17711671
申请日:2022-04-01
Applicant: Intel Corporation
Inventor: Samkit Jain , Nicholas M. Pepperling , Izajasz Piotr Wrosz , Joshua B. Fryman , Ibrahim Hur
Abstract: A distributed simulation system is provided that includes a plurality of computing nodes interconnected via a network implementing a Message Passing Interface (MPI) protocol. Each computing node is to simulate hardware logic of a core of a graph processing system and to simulate a respective system memory portion of the graph processing system.
-
公开(公告)号:US20220100511A1
公开(公告)日:2022-03-31
申请号:US17033770
申请日:2020-09-26
Applicant: Intel Corporation
Inventor: Wim Heirman , Stijn Eyerman , Ibrahim Hur
IPC: G06F9/30 , G06F12/0804 , G06F12/12
Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-