Abstract:
A method and apparatus for exiting a low power state based on a prior prediction is disclosed. An integrated circuit (IC) includes a functional unit configured to, during operation, cycle between intervals of an active state and intervals of an idle state. The IC also include a power management unit configured to place the functional unit in a low power state responsive to the functional unit entering the idle state. The power management unit is further configured to preemptively cause the functional unit to exit the low power state at a predetermined time after entering the low power. The predetermined time is based on a prediction of idle state duration made prior to entering the low power state. The prediction may be generated by a prediction unit, based on a history of durations of intervals in which the functional unit was in the idle state.
Abstract:
Durations of active performance states of components of a processing system can be predicted based on one or more previous durations of an active state of the components. One or more entities in the processing system such as processor cores or caches can be configured based on the predicted durations of the active state of the components. Some embodiments configure a first component in a processing system based on a predicted duration of an active state of a second component of the processing system. The predicted duration is predicted based on one or more previous durations of an active state of the second component.
Abstract:
A level of cache memory receives modified data from a higher level of cache memory. A set of cache lines with an index associated with the modified data is identified. The modified data is stored in the set in a cache line with an eviction priority that is at least as high as an eviction priority, before the modified data is stored, of an unmodified cache line with a highest eviction priority among unmodified cache lines in the set.
Abstract:
A system and method for efficiently powering down banks in a cache memory for reducing power consumption. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, each comprising multiple cache sets. In response to a request to power down a first bank of the multiple banks in the cache array, the cache controller selects a cache line of a given type in the first bank and determines whether a respective locality of reference for the selected cache line exceeds a threshold. If the threshold is exceeded, then the selected cache line is migrated to a second bank in the cache array. If the threshold is not exceeded, then the selected cache line is written back to lower-level memory.
Abstract:
A cache memory receives a request to perform a write operation. The request specifies an address. A first determination is made that the cache memory does not include a cache line corresponding to the address. A second determination is made that the address is between a previous value of a stack pointer and a current value of the stack pointer. A third determination is made that a write history indicator is set to a specified value. The write operation is performed in the cache memory without waiting for a cache fill corresponding to the address to be performed, in response to the first, second, and third determinations.
Abstract:
A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache and one or more sources for memory requests. In response to receiving a request to allocate data of a first type, a cache controller allocates the data in the cache responsive to determining a limit of an amount of data of the first type permitted in the cache is not reached. The controller maintains an amount and location information of the data of the first type stored in the cache. Additionally, the cache may be partitioned with each partition designated for storing data of a given type. Allocation of data of the first type is dependent at least upon the availability of a first partition and a limit of an amount of data of the first type in a second partition.
Abstract:
A processor system presented here has a plurality of execution cores and a plurality of stack caches, wherein each of the stack caches is associated with a different one of the execution cores. A method of managing stack data for the processor system is presented here. The method maintains a stack cache manager for the plurality of execution cores. The stack cache manager includes entries for stack data accessed by the plurality of execution cores. The method processes, for a requesting execution core of the plurality of execution cores, a virtual address for requested stack data. The method continues by accessing the stack cache manager to search for an entry of the stack cache manager that includes the virtual address for requested stack data, and using information in the entry to retrieve the requested stack data.
Abstract:
An apparatus includes a memory controller that includes logic to receive a first memory request having a first request type and a second memory request having a second request type. The apparatus also includes a scheduling unit that includes logic to schedule an order of the first and second memory requests for execution based upon a first parameter value and a second parameter value. The first parameter value corresponds to a utility and energy cost for the first memory request and the second parameter value corresponds to a utility and energy cost for the second memory request.
Abstract:
A processor distributes memory timing parameters and data among different memory modules based upon memory access patterns. The memory access patterns indicate different types, or classes, of data for an executing workload, with each class associated with different memory access characteristics, such as different row buffer hit rate levels, different frequencies of access, different criticalities, and the like. The processor assigns each memory module to a data class and sets the memory timing parameters for each memory module according to the module's assigned data class, thereby tailoring the memory timing parameters for efficient access of the corresponding data.
Abstract:
A processing system includes a plurality of processor cores formed in a first layer of an integrated circuit device and a plurality of partitions of memory formed in one or more second layers of the integrated circuit device. The one or more second layers are deployed in a stacked configuration with the first layer. Each of the partitions is associated with a subset of the processor cores that have overlapping footprints with the partitions. The processing system also includes first memory paths between the processor cores and their corresponding subsets of partitions. The processing system further includes second memory paths between the processor cores and the partitions.