Abstract:
Various embodiments relating to performing multiple computations are provided. In one embodiment, a computing system includes an off-chip storage device configured to store a plurality of stream elements and associated tags and a computation device. The computation device includes an on-chip storage device configured to store a plurality of independently addressable resident elements, and a plurality of parallel processing units. Each parallel processing unit may be configured to receive one or more stream elements and associated tags from the off-chip storage device and select one or more resident elements from a subset of resident elements driven in parallel from the on-chip storage device. A selected resident element may be indicated by an associated tag as matching a stream element. Each parallel processing unit may be configured to perform one or more computations using the one or more stream elements and the one or more selected resident elements.
Abstract:
Various embodiments relating to encoding a sparse matrix into a data structure format that may be efficiently processed via parallel processing of a computing system are provided. In one embodiment, a sparse matrix may be received. A set of designated rows of the sparse matrix may be traversed until all non-zero elements in the sparse matrix have been placed in a first array. Each time a row in the set is traversed, a next non-zero element in that row may be placed in the first array. If all non-zero elements for a given row of the set of designated rows have been placed in the first array, the given row may be replaced in the set of designated rows with a next unprocessed row of the sparse matrix. The data structure in which the sparse matrix is encoded may be outputted. The data structure may include the first array.
Abstract:
A memory region can durably self-identify as being faulty when read. Information that would have been assigned to the faulty memory region can be assigned to another of that sized region in memory using a replacement encoding technique. For phase change memory, at least two fault states can be provided for durably self-identifying a faulty memory region; one state at a highest resistance range and the other state at a lowest resistance range. Replacement cells can be used to shift or assign data when a self-identifying memory fault is present. A memory controller and memory module, alone or in combination may manage replacement cell use and facilitate driving a newly discovered faulty cell to a fault state if the faulty cell is not already at the fault state.
Abstract:
The subject disclosure is directed towards a technology that timely pre-fetches content to a computing device based upon a prediction that a user will be requesting access to the content. Features comprising temporal features, spatial features, spatiotemporal features and/or other features associated with content are provided to a model trained at least in part with historical access data. The model returns information from which a determination of whether to pre-fetch the content is made.
Abstract:
A system and method are provided for enhancing approximate computing by a computer system. In one example, an interface is provided comprising a variable-identifier module and a bit-priority module. The variable-identifier module is configured to identify one or more variables of data that are to be processed by the computer system with approximate precision. Approximate precision is a precision level at which a hardware device does not guarantee full data-correctness for the one or more variables. The bit-priority module is configured to assign bit-priorities to the one or more variables. The bit-priorities include relative levels of importance among bits of each of the one or more variables. The relative levels of importance include at least high-priority bits and low-priority bits.
Abstract:
The techniques discussed herein identify failed segments of memory in a memory region. The techniques may then manage the failed segments of memory by logically clustering the failed segments of memory at an outlying portion of the memory region using a remapping process. The remapping process may include creating and storing remapping metadata defining segment remapping entries for the memory region. Accordingly, the failure clustering logically eliminates or reduces the memory fragmentation so that a system can allocate larger portions of contiguous memory for object storage.
Abstract:
The present technology relaxes the precision (or full data-correctness-guarantees) requirements in memory operations, such as writing or reading, of MLC memories so that an application may write and read a digital data value as an approximate value. Types of MLCs include Flash MLC and MLC Phase Change Memory (PCM) as well as other resistive technologies. Many software applications may not need the accuracy or precision typically used to store and read data values. For example, an application may render an image on a relatively low resolution display and may not need an accurate data value for each pixel. By relaxing the precision or correctness requirements is a memory operation, MLC memories may have increased performance, lifetime, density, and/or energy efficiency.
Abstract:
A multi-core processor with a shared physical memory is described. In an embodiment a sending core sends a memory write request to a destination core so that the request may be acted upon by the destination core as if it originated from the destination core. In an example, a data structure is configured in the shared physical memory and mapped to be accessible to the sending and destination cores. In an example, the shared data structure is used as a message channel between the sending and destination cores to carry data using the memory write request. In an embodiment a notification mechanism is enabled using the shared physical memory in order to notify the destination core of events by updating a notification data structure. In an example, the notification mechanism triggers a notification process at the destination core to inform a receiving process of a notification.
Abstract:
A multi-core processor with a shared physical memory is described. In an embodiment a sending core sends a memory write request to a destination core so that the request may be acted upon by the destination core as if it originated from the destination core. In an example, a data structure is configured in the shared physical memory and mapped to be accessible to the sending and destination cores. In an example, the shared data structure is used as a message channel between the sending and destination cores to carry data using the memory write request. In an embodiment a notification mechanism is enabled using the shared physical memory in order to notify the destination core of events by updating a notification data structure. In an example, the notification mechanism triggers a notification process at the destination core to inform a receiving process of a notification.
Abstract:
The subject disclosure is directed towards a technology that timely pre-fetches content to a computing device based upon a prediction that a user will be requesting access to the content. Features comprising temporal features, spatial features, spatiotemporal features and/or other features associated with content are provided to a model trained at least in part with historical access data. The model returns information from which a determination of whether to pre-fetch the content is made.