Abstract:
A method for migrating a workload includes: receiving workloads generated from a plurality of applications running in a plurality of server nodes of a rack system; monitoring latency requirements for the workloads and detecting a violation of the latency requirement for a workload; collecting system utilization information of the rack system; calculating rewards for migrating the workload to other server nodes in the rack system; determining a target server node among the plurality of server nodes that maximizes the reward; and performing migration of the workload to the target server node.
Abstract:
An electronic system includes: a processor configured to access operation data; a local cache memory, coupled to the processor, configured to store a limited amount of the operation data; a memory controller, coupled to the local cache memory, configured to maintain a flow of the operation data; and a memory subsystem, coupled to the memory controller, including: a first tier memory configured to store the operation data, with critical timing, by a fast control bus, and a second tier memory configured to store the operation data with non-critical timing, by a reduced performance control bus.
Abstract:
A dedupe module is provided. The dedupe module includes: a host interface; a dedupe engine to receive a data request from a host system via the host interface; a memory controller; a plurality of memory modules, each memory module being coupled to the memory controller; and a read cache for caching data from the memory controller for use by the dedupe engine.
Abstract:
An electronic system includes: a second memory module; a first memory module coupled to the second memory module; and a multicast controller for managing a cache on the first memory module for the second memory module.
Abstract:
A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
Abstract:
A method of retrieving data stored in a memory associated with a dedupe module is provided. The method includes: identifying a logical address of the data; identifying a physical line ID of the data in accordance with the logical address by looking up at least a portion of the logical address in a translation table; locating a respective physical line, the respective physical line corresponding to the PLID; and retrieving the data from the respective physical line, the retrieving including copying a respective hash cylinder to the read cache, the respective hash cylinder including: a respective hash bucket, the respective hash bucket including the respective physical line; and a respective reference counter bucket, the respective reference counter bucket including a respective reference counter associated with the respective physical line.
Abstract:
A deduplication memory module, which is configured to internally perform memory deduplication, includes a hash table memory for storing multiple blocks of data in a hash table array including hash tables, each of the hash tables including physical buckets and a plurality of virtual buckets each including some of the physical buckets, each of the physical buckets including ways, an address lookup table memory (ALUTM) including a plurality of pointers indicating a location of each of the stored blocks of data in a corresponding one of the physical buckets, and a buffer memory for storing unique blocks of data not stored in the hash table memory when the hash table array is full, a processor, and memory, wherein the memory has stored thereon instructions that, when executed by the processor, cause the memory module to exchange data with an external system.
Abstract:
A memory (1205) is disclosed. The memory (1205) can includes a stack of dynamic Random Access Memory (DRAM) cores (1210, 1215, 1220, 1225) in a three-dimensional stacked memory architecture (1230). Each of the DRAM cores (1210, 1215, 1220, 1225) can include a plurality of banks (205-1, 205-2, 205-3, 205-4) to store data. The memory (1205) can also include logic layer (1235) which can include an interface (1305) to connect the memory (1205) with a processor (120). The logic layer (1235) can also include a refresh engine (115) that can be used to refresh one of the plurality of banks (205-1, 205-2, 205-3, 205-4) and a Smart Refresh Component (305) that can advise the refresh engine (115) which bank to refresh using an out-of-order per-bank refresh. The Smart Refresh Component (305) can use a logic (415) to identify a farthest bank in the pending transactions in the transaction queue (430) at the time of refresh.
Abstract:
A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
Abstract:
A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.