System, method, and apparatus for enhanced pointer identification and prefetching

    公开(公告)号:US11693780B2

    公开(公告)日:2023-07-04

    申请号:US17391962

    申请日:2021-08-02

    CPC classification number: G06F12/0862 G06F2212/602

    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.

    SPECULATIVE DECOMPRESSION WITHIN PROCESSOR CORE CACHES

    公开(公告)号:US20220197643A1

    公开(公告)日:2022-06-23

    申请号:US17133618

    申请日:2020-12-23

    Abstract: Methods and apparatus relating to speculative decompression within processor core caches are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into a plurality of cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the plurality of cachelines of the cache of the processor core in response to the second micro operation. The decompression instruction causes the DE circuitry to perform an out-of-order decompression of the plurality of cachelines. Other embodiments are also disclosed and claimed.

    Instruction set architecture based and automatic load tracking for opportunistic re-steer of data-dependent flaky branches

    公开(公告)号:US11321089B2

    公开(公告)日:2022-05-03

    申请号:US16914338

    申请日:2020-06-27

    Abstract: Methods and apparatuses relating to instruction set architecture (ISA) based and automatic load tracking hardware for opportunistic re-steer of data-dependent flaky branches are described. In one embodiment, a processor includes a pipeline circuit comprising a decoder to decode instructions into decoded instructions and an execution circuit to execute the decoded instructions, a branch predictor circuit to generate a predicted path for a branch instruction, and a branch re-steer circuit to, for the branch instruction dependent on a result from a load instruction, check if an instruction received by the pipeline circuit is the load instruction, and when the instruction received by the pipeline circuit is the load instruction, check for a write back of the result from the load instruction between a decode of the branch instruction with the decoder and an execution of the branch instruction with the execution circuit, and when the predicted path differs from a path based on the result from the load instruction, re-steer the branch instruction in the pipeline circuit to the path and cause execution of the branch instruction for the path based on the result from the load instruction.

    APPARATUSES, METHODS, AND SYSTEMS FOR A DUPLICATION RESISTANT ON-DIE IRREGULAR DATA PREFETCHER

    公开(公告)号:US20210303468A1

    公开(公告)日:2021-09-30

    申请号:US16833419

    申请日:2020-03-27

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described. In one embodiment, a hardware processor includes a cache to store a plurality of cache lines of data, a processing element to execute instructions to generate memory requests, and a prefetch circuit to track a first set of cache lines, requested to be accessed by the memory requests, that repeat in a first number of executed instructions, track a second set of cache lines, requested to be accessed by the memory requests, that repeat in a second, larger number of executed instructions, detect a memory request from an instruction for a cache line from the cache, determine if the cache line is within the first set of cache lines or the second set of cache lines, update first correlation data for the cache line when the cache line is within the first set of cache lines, and update second correlation data for the cache line when the cache line is within the second set of cache lines.

    Techniques for Accelerating Neural Networks

    公开(公告)号:US20210166114A1

    公开(公告)日:2021-06-03

    申请号:US17172627

    申请日:2021-02-10

    Abstract: Embodiments are generally directed to techniques for accelerating neural networks. Many embodiments include a hardware accelerator for a bi-directional multi-layered GRU and LC neural network. Some embodiments are particularly directed to a hardware accelerator that enables offloading of the entire LC+GRU network to the hardware accelerator. Various embodiments include a hardware accelerator with a plurality of matrix vector units to perform GRU steps in parallel with LC steps. For example, at least a portion of computation by a first matrix vector unit of a GRU step in a neural network may overlap at least a portion of computation by a second matrix vector unit of an output feature vector for the neural network. Several embodiments include overlapping computation associated with a layer of a neural network with data transfer associated with another of the neural network.

    TILE-BASED SPARSITY AWARE DATAFLOW OPTIMIZATION FOR SPARSE DATA

    公开(公告)号:US20210090328A1

    公开(公告)日:2021-03-25

    申请号:US17114315

    申请日:2020-12-07

    Abstract: Systems, apparatuses and methods provide technology for optimizing processing of sparse data, such as 3D pointcloud data sets. The technology may include generating a locality-aware rulebook based on an input unstructured sparse data set, such as a 3D pointcloud data set, the locality-aware rulebook storing spatial neighborhood information for active voxels in the input unstructured sparse data set, computing an average receptive field (ARF) value based on the locality aware rulebook, and determining, from a plurality of tile size and loop order combinations, a tile size and loop order combination for processing the unstructured sparse data based on the computed ARF value. The technology may also include providing the locality-aware rulebook and the tile size and loop order combination to a compute engine such as a neural network, the compute engine to process the unstructured sparse data using the locality aware rulebook and the tile size and loop order combination.

Patent Agency Ranking