-
公开(公告)号:US11960399B2
公开(公告)日:2024-04-16
申请号:US17558034
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Akhil Arunkumar , Tarun Nakra , Maxim V. Kazakov , Milind N. Nemlekar
IPC: G06F12/0811 , G06F12/0853 , G06F13/16
CPC classification number: G06F12/0811 , G06F12/0853 , G06F13/1642 , G06F13/1668
Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
-
公开(公告)号:US20240029336A1
公开(公告)日:2024-01-25
申请号:US18480466
申请日:2023-10-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak
CPC classification number: G06T15/005 , G06F9/545 , G06T15/80
Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
-
公开(公告)号:US20230004871A1
公开(公告)日:2023-01-05
申请号:US17364787
申请日:2021-06-30
Applicant: Advanced Micro Devices, Inc.
Abstract: Methods, systems, and devices for pipeline fusion of a plurality of kernels. In some implementations, a first batch of a first kernel is executed on a first processing device to generate a first output of the first kernel based on an input. A first batch of a second kernel is executed on a second processing device to generate a first output of the second kernel based on the first output of the first kernel. A second batch of the first kernel is executed on the first processing device to generate a second output of the first kernel based on the input. The execution of the second batch of the first kernel overlaps at least partially in time with executing the first batch of the second kernel.
-
公开(公告)号:US20220101179A1
公开(公告)日:2022-03-31
申请号:US17032778
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.
-
公开(公告)号:US20220101110A1
公开(公告)日:2022-03-31
申请号:US17032971
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Swapnil P. Sakharshete , Maxim V. Kazakov
Abstract: Techniques are disclosed for performing machine learning operations. The techniques include fetching weights for a first layer in a first format; performing matrix multiplication of the weights fetched in the first format with values provided by a prior layer in a forwards training pass; fetching the weights for the first layer in a second format different from the first format; and performing matrix multiplication for a backwards pass, the matrix multiplication including multiplication of the weights fetched in the second format with values corresponding to values provided as the result of the forwards training pass for the first layer.
-
公开(公告)号:US10991146B2
公开(公告)日:2021-04-27
申请号:US16723232
申请日:2019-12-20
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Maxim V. Kazakov , Mark Fowler
Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.
-
公开(公告)号:US12165016B2
公开(公告)日:2024-12-10
申请号:US17032778
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.
-
公开(公告)号:US12072952B2
公开(公告)日:2024-08-27
申请号:US17214779
申请日:2021-03-26
Applicant: Advanced Micro Devices, Inc.
IPC: G06F17/16 , G06F7/523 , G06F7/544 , H03K19/173
CPC classification number: G06F17/16 , G06F7/523 , G06F7/5443 , H03K19/1737
Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.
-
公开(公告)号:US11657119B2
公开(公告)日:2023-05-23
申请号:US16557911
申请日:2019-08-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Swapnil P. Sakharshete , Samuel Lawrence Wasmundt , Maxim V. Kazakov , Vineet Goel
Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.
-
公开(公告)号:US20220318021A1
公开(公告)日:2022-10-06
申请号:US17219775
申请日:2021-03-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.
-
-
-
-
-
-
-
-
-