Abstract:
Methods and apparatus to provide virtualized vector processing are described. In one embodiment, one or more operations corresponding to a virtual vector request are distributed to one or more processor cores for execution.
Abstract:
In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.
Abstract:
In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.
Abstract:
In an embodiment, a processor a plurality of cores to independently execute instructions, the cores including a plurality of counters to store performance information, and a power controller coupled to the plurality of cores, the power controller having a logic to receive performance information from at least some of the plurality of counters, determine a number of cores to be active and a performance state for the number of cores for a next operation interval, based at least in part on the performance information and model information, and cause the number of cores to be active during the next operation interval, the performance information associated with execution of a workload on one or more of the plurality of cores. Other embodiments are described and claimed.
Abstract:
An apparatus and method for determining whether to execute an atomic operation locally or remotely. For example, one embodiment of a processor comprises: a decoder to decode an atomic operation on a local core; prediction logic on the local core to estimate a cost associated with execution of the atomic operation on the local core and a cost associated with execution of the atomic operation on a remote core; and the remote core to execute the atomic operation remotely if the prediction logic determines that the cost for execution on the local core is relatively greater than the cost for execution on the remote core; and the local core to execute the atomic operation locally if the prediction logic determines that the cost for local execution on the local core is relatively less than the cost for execution on the remote core.
Abstract:
A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion.
Abstract:
Technologies for migration of dynamic home tile mapping are described. A cache controller can receive coherence messages from other processor cores on the die. The cache controller records locations from which the coherence messages originate and determine distances between the requested home tiles and the locations from which the coherence messages originate. The cache controller determines whether an average distance between a particular home tile, whose identifier is stored in the home tile table, exceeds a threshold. When the average distance exceeds the defined threshold, the cache controller migrates the particular home tile to another location.
Abstract:
Methods and apparatus to provide virtualized vector processing are described. In one embodiment, one or more operations corresponding to a virtual vector request are distributed to one or more processor cores for execution.
Abstract:
Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
Abstract:
A processing device comprises a processing device cache and a cache controller. The cache controller initiates a cache line eviction process and determines determine an object liveness value associated with a cache line in the processing device cache. The cache controller applies the object liveness value to a cache line eviction policy and evicts the cache line from the processing device cache based on the object liveness value and the cache line eviction policy.