-
公开(公告)号:US20250110878A1
公开(公告)日:2025-04-03
申请号:US18374969
申请日:2023-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Travis Henry Boraten , Jagadish B. Kotra , David Andrew Werner
IPC: G06F12/0815
Abstract: Selectively bypassing cache directory lookups for processing-in-memory instructions is described. In one example, a system maintains information describing a status—clean or dirty—of a memory address, where a dirty status indicates that the memory address is modified in a cache and thus different than the memory address as represented in system memory. A processing-in-memory request involving the memory address is assigned a cache directory bypass bit based on the status of the memory address. The cache directory bypass bit for a processing-in-memory request controls whether a cache directory lookup is performed after the processing-in-memory request is issued by a processor core and before the processing-in-memory request is executed by a processing-in-memory component.
-
公开(公告)号:US20250006232A1
公开(公告)日:2025-01-02
申请号:US18346110
申请日:2023-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Ioannis Papadopoulos , Vignesh Adhinarayanan , Ashwin Aji , Jagadish B. Kotra
Abstract: An apparatus and method for creating less computationally intensive nodes for a neural network. An integrated circuit includes a host processor and multiple memory channels, each with multiple memory array banks. Each of the memory array banks includes components of a processing-in-memory (PIM) accelerator and a scatter and gather circuit used to dynamically perform quantization operations and dequantization operations that offload these operations from the host processor. The host processor executes a data model that represents a neural network. The memory array banks store a single copy of a particular data value in a single precision. Therefore, the memory array banks avoid storing replications of the same data value with different precisions to be used by a neural network node. The memory array banks dynamically perform quantization operations and dequantization operations on one or more of the weight values, input data values, and activation output values of the neural network.
-
3.
公开(公告)号:US20240202116A1
公开(公告)日:2024-06-20
申请号:US18068930
申请日:2022-12-20
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos , Paul James Moyer , Nicholas Dean Lance , Sriram Srinivasan , Patrick James Shyvers , William Louie Walker
IPC: G06F12/0802
CPC classification number: G06F12/0802 , G06F2212/1016 , G06F2212/1028 , G06F2212/1044
Abstract: An entry of a last level cache shadow tag array to track pending last level cache misses to private data in a previous level cache (e.g., an L2 cache), that also are misses to an exclusive last level cache (e.g., an L3 cache) and to the last level cache shadow tag array. Accordingly, last level cache miss status holding registers need not be expended to track cache misses to private data that are already being tracked by a previous level cache miss status holding register. Additionally or alternatively, up to a threshold number of last level cache pending misses to the same shared data from different processor cores are tracked in the last level cache shadow tag array, and any additional last level cache pending misses are tracked in a last level cache miss status holding register.
-
公开(公告)号:US20240193097A1
公开(公告)日:2024-06-13
申请号:US18064155
申请日:2022-12-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
IPC: G06F12/1045 , G06F12/0897
CPC classification number: G06F12/1045 , G06F12/0897
Abstract: Address translation is performed to translate a virtual address targeted by a memory request (e.g., a load or memory request for data or an instruction) to a physical address. This translation is performed using an address translation buffer, e.g., a translation lookaside buffer (TLB). One or more actions are taken to reduce data access latencies for memory requests in the event of a TLB miss where the virtual address to physical address translation is not in the TLB. Examples of actions that are performed in various implementations in response to a TLB miss include bypassing level 1 (L1) and level 2 (L2) caches in the memory system, and speculatively sending the memory request to the L2 cache while checking whether the memory request is satisfied by the L1 cache.
-
公开(公告)号:US11586441B2
公开(公告)日:2023-02-21
申请号:US17125730
申请日:2020-12-17
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Jagadish B. Kotra
IPC: G06F9/38 , G06F12/0897 , G06F12/0875 , G06F9/30
Abstract: Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.
-
公开(公告)号:US20220188208A1
公开(公告)日:2022-06-16
申请号:US17118404
申请日:2020-12-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Anthony Gutierrez , Yasuko Eckert , Sergey Blagodurov , Jagadish B. Kotra
IPC: G06F11/30 , G06F1/20 , G06F12/0815 , G11C11/406 , G06F9/48 , G06F9/30
Abstract: A method may include, in response to a change in an operating parameter of a processing unit, modifying a signal pathway to a processing circuit component of the processing unit, and communicating with the processing circuit component via the signal pathway.
-
公开(公告)号:US20250110655A1
公开(公告)日:2025-04-03
申请号:US18477272
申请日:2023-09-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , Divya Madapusi Srinivas Prasad
Abstract: Efficient memory operation using a destructive read memory array is described. In accordance with the described techniques, a system may include a memory configured to store data of a first logic state in a ferroelectric capacitor when an electric polarization of the ferroelectric capacitor is in a first direction. A system may include a controller configured to erase the data from the memory by commanding the electric polarization of the ferroelectric capacitor in a second direction, opposite of the first direction and skipping a subsequent write operation of a null value to the memory.
-
公开(公告)号:US12189953B2
公开(公告)日:2025-01-07
申请号:US17956417
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , John Kalamatianos
Abstract: Methods, devices, and systems for retrieving information based on cache miss prediction. It is predicted, based on a history of cache misses at a private cache, that a cache lookup for the information will miss a shared victim cache. A speculative memory request is enabled based on the prediction that the cache lookup for the information will miss the shared victim cache. The information is fetched based on the enabled speculative memory request.
-
公开(公告)号:US12099723B2
公开(公告)日:2024-09-24
申请号:US17956614
申请日:2022-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Jagadish B. Kotra , Marko Scrbak
IPC: G06F3/06
CPC classification number: G06F3/0613 , G06F3/0659 , G06F3/0679
Abstract: A method for operating a memory having a plurality of banks accessible in parallel, each bank including a plurality of grains accessible in parallel is provided. The method includes: based on a memory access request that specifies a memory address, identifying a set that stores data for the memory access request, wherein the set is spread across multiple grains of the plurality of grains; and performing operations to satisfy the memory access request, using entries of the set stored across the multiple grains of the plurality of grains.
-
公开(公告)号:US12019566B2
公开(公告)日:2024-06-25
申请号:US16938364
申请日:2020-07-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sergey Blagodurov , Johnathan Alsop , Jagadish B. Kotra , Marko Scrbak , Ganesh Dasika
IPC: G06F13/16 , G06F9/30 , H04L45/122
CPC classification number: G06F13/1642 , G06F9/3004 , G06F9/30098 , G06F13/1663 , H04L45/122
Abstract: Arbitrating atomic memory operations, including: receiving, by a media controller, a plurality of atomic memory operations; determining, by an atomics controller associated with the media controller, based on one or more arbitration rules, an ordering for issuing the plurality of atomic memory operations; and issuing the plurality of atomic memory operations to a memory module according to the ordering.
-
-
-
-
-
-
-
-
-