Abstract:
Techniques for slicing memory of a hardware processor core by linear address are described. In certain examples, a hardware processor core includes memory circuitry having: a cache comprising a plurality of slices of memory, wherein each of a plurality of cache lines of memory are only stored in a single slice, and each slice stores a different range of address values compared to any other slice, wherein each of the plurality of slices of memory comprises: an incomplete load buffer to store a load address from the address generation circuit for a load request operation, broadcast to the plurality of slices of memory by the memory circuit from the execution circuit, in response to the load address being within a range of address values of that memory slice, a store address buffer to store a store address from the address generation circuit for a store request operation, broadcast to the plurality of slices of memory by the memory circuit from the execution circuit, in response to the store address being within a range of address values of that memory slice, a store data buffer to store data, including the data for the store request operation that is to be stored at the store address, for each store request operation broadcast to the plurality of slices of memory by the memory circuit from the execution circuit, and a store completion buffer to store the data for the store request operation in response to the store address being stored in the store address buffer of that memory slice, and, in response, clear the store address for the store request operation from the store address buffer and clear the data for the store request operation from the store data buffer.
Abstract:
Techniques for prefetching by a hardware processor are described. In certain examples, a hardware processor includes execution circuitry, cache memories, and prefetcher circuitry. The execution circuitry is to execute instructions to access data at a memory address. The cache memories include a first cache memory at a first cache level and a second cache memory at a second cache level. The prefetcher circuitry is to prefetch the data from a system memory to at least one of the plurality of cache memories, and it includes a first-level prefetcher to prefetch the data to the first cache memory, a second-level prefetcher to prefetch the data to the second cache memory, and a plurality of prefetch filters. One of the prefetch filters is to filter exclusively for the first-level prefetcher. Another of the prefetch filters is to maintain a history of demand and prefetch accesses to pages in the system memory and to use the history to provide training information to the second-level prefetcher.
Abstract:
Techniques for scheduling merged store operations are described. In an embodiment, an apparatus includes a data cache; a fill buffer; a store buffer to store first information associated with a first retired store operation and second information associated with a second retired store operation; a store coalescing buffer (SCB) to receive the first information from the store buffer, to store the first information in an SCB entry, to merge the second information from the store buffer into the entry, and to provide data associated with the entry for a write to the data cache or the fill buffer; and a global store scheduler (GSS) to schedule the write relative to an other write from an other SCB in compliance with one or more store ordering rules.
Abstract:
Method and apparatus to efficiently detect violations of data dependency relationships. A memory address associated with a computer instruction may be obtained. A current state of the memory address may be identified. The current state may include whether the memory address is associated with a read or a store instruction, and whether the memory address is associated with a set or a check. A previously accumulated state associated with the memory address may be retrieved from a data structure. The previously accumulated state may include whether the memory address was previously associated with a read or a store instruction, and whether the memory address was previously associated with a set or a check. If a transition from the previously accumulated state to the current state is invalid, a failure condition may be signaled.
Abstract:
In an embodiment, a data path circuit includes: a plurality of pipeline stages coupled between an input of the data path circuit and an output of the data path circuit; and a first selection circuit coupled between a first pipeline stage and a second pipeline stage, the first selection circuit having a first input to receive an input to the first pipeline stage and a second input to receive an output of the first pipeline stage and controllable to output one of the input to the first pipeline stage and the output of the first pipeline stage. A bypass controller coupled to the data path circuit may control the first selection circuit based at least in part on an operating frequency of the data path circuit. Other embodiments are described and claimed.
Abstract:
In an embodiment, a data path circuit includes: a plurality of pipeline stages coupled between an input of the data path circuit and an output of the data path circuit; and a first selection circuit coupled between a first pipeline stage and a second pipeline stage, the first selection circuit having a first input to receive an input to the first pipeline stage and a second input to receive an output of the first pipeline stage and controllable to output one of the input to the first pipeline stage and the output of the first pipeline stage. A bypass controller coupled to the data path circuit may control the first selection circuit based at least in part on an operating frequency of the data path circuit. Other embodiments are described and claimed.