Apparatuses, methods, and systems for dual spatial pattern prefetcher

    公开(公告)号:US11874773B2

    公开(公告)日:2024-01-16

    申请号:US16729344

    申请日:2019-12-28

    CPC classification number: G06F12/0862 G06F2212/602

    Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described. In one embodiment, a prefetch circuit is to prefetch a cache line into a cache from a memory by tracking page and cache line accesses to the cache for a single access signature, generate a spatial bit pattern, for the cache line accesses for each page of a plurality of pages, that is shifted to a first cache line access for each page, generate a single spatial bit pattern for the single access signature for each of the spatial bit patterns that have a same spatial bit pattern to form a plurality of single spatial bit patterns, perform a logical OR operation on the plurality of single spatial bit patterns to create a first modulated bit pattern for the single access signature, perform a logical AND operation on the plurality of single spatial bit patterns to create a second modulated bit pattern for the single access signature, receive a prefetch request for the single access signature, and perform a prefetch operation for the prefetch request using the first modulated bit pattern when a threshold is not exceeded and the second modulated bit pattern when the threshold is exceeded.

    REGION AWARE DELTA PREFETCHER
    25.
    发明公开

    公开(公告)号:US20230205699A1

    公开(公告)日:2023-06-29

    申请号:US17561831

    申请日:2021-12-24

    CPC classification number: G06F12/0862 G06F12/0811 G06F12/0877 G06F9/3816

    Abstract: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.

    CACHING BASED ON BRANCH INSTRUCTIONS IN A PROCESSOR

    公开(公告)号:US20230103206A1

    公开(公告)日:2023-03-30

    申请号:US17448806

    申请日:2021-09-24

    Abstract: In an embodiment, a processor may include an execution circuit to execute a plurality of instructions, a cache, and a decode circuit. The decode circuit may be to: detect a branch instruction in a program, the branch instruction to cause execution to follow either a first path or a second path in the program; and in response to a determination that the branch instruction is a hard to predict (HTP) branch, cause first and second sets of instructions to be stored in the cache, where the first set of instructions is included in the first path, and where the second set of instructions is included in the second path. Other embodiments are described and claimed.

    DYNAMIC SHARED CACHE PARTITION FOR WORKLOAD WITH LARGE CODE FOOTPRINT

    公开(公告)号:US20220197794A1

    公开(公告)日:2022-06-23

    申请号:US17130698

    申请日:2020-12-22

    Abstract: An embodiment of an integrated circuit may comprise a core, a first level core cache memory coupled to the core, a shared core cache memory coupled to the core, a first cache controller coupled to the core and communicatively coupled to the first level core cache memory, a second cache controller coupled to the core and communicatively coupled to the shared core cache memory, and circuitry coupled to the core and communicatively coupled to the first cache controller and the second cache controller to determine if a workload has a large code footprint, and, if so determined, partition N ways of the shared core cache memory into first and second chunks of ways with the first chunk of M ways reserved for code cache lines from the workload and the second chunk of N minus M ways reserved for data cache lines from the workload, where N and M are positive integer values and N minus M is greater than zero. Other embodiments are disclosed and claimed.

    APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE

    公开(公告)号:US20220197659A1

    公开(公告)日:2022-06-23

    申请号:US17133622

    申请日:2020-12-23

    Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.

Patent Agency Ranking