Abstract:
Examples herein involve storing registers in the event of a system failure. In examples herein, an association is made between a register and a location in persistent memory based on an execution of a process. Upon detection of the failure, the content of the register is stored to the persistent memory location based on the association.
Abstract:
In one example, a central processing unit (CPU) with dynamic thread mapping includes a set of multiple cores each with a set of multiple threads. A set of registers for each of the multiple threads monitors for in-flight memory requests the number of loads from and stores to at least a first memory interface and a second memory interface by each respective thread. The second memory interface has a greater latency than the first memory interface. The CPU further has logic to map and migrate each thread to respective CPU cores where the number of cores accessing only one of the at least first and second memory interfaces is maximized.
Abstract:
A technique includes, in response to a cache miss occurring with a given processing node of a plurality of processing nodes, using a directory-based coherence system for the plurality of processing nodes to regulate snooping of an address that is associated with the cache miss. Using the directory-based coherence system to regulate whether the address is included in a snooping domain is based at least in part on a number of cache misses associated with the address.
Abstract:
In an example, an apparatus is described that includes a memory array. The memory array includes a volatile memory, a first non-volatile memory, and a second non-volatile memory. The memory array further includes a cache manager that controls access by a computer system to the memory array. For instance, the cache manager may carry out memory operations, including read operations, write operations, and cache evictions, in conjunction with at least one of the volatile memory, the first non-volatile memory, or the second non-volatile memory.
Abstract:
Examples disclosed herein relate to using a memory side accelerator to calculate updated deep learning parameters. A globally addressable memory includes deep learning parameters. The deep learning parameters are partitioned, where each partition is associated with a memory side accelerator. A memory side accelerator is to receive calculated gradient updates associated with its partition and calculate an update to the deep learning parameters associated with the partition.