-
31.
公开(公告)号:US12265484B2
公开(公告)日:2025-04-01
申请号:US17467104
申请日:2021-09-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
IPC: G06F13/16 , G06F9/30 , G06F9/38 , G06F12/0875
Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.
-
公开(公告)号:US12197533B2
公开(公告)日:2025-01-14
申请号:US17214784
申请日:2021-03-26
Applicant: Advanced Micro Devices, Inc.
Abstract: A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the products of the pairs of elements as a sum of the exponents of the pairs of elements and performing matrix multiplication on remaining products of the pairs of elements of the first and second matrices.
-
公开(公告)号:US11790590B2
公开(公告)日:2023-10-17
申请号:US17218421
申请日:2021-03-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak
CPC classification number: G06T15/005 , G06F9/545 , G06T15/80
Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
-
公开(公告)号:US20230289191A1
公开(公告)日:2023-09-14
申请号:US18128642
申请日:2023-03-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. Rush , Michael Mantor , Arun Vaidyanathan Ananthanarayan , Prasad Nagabhushanamgari , Maxim V. Kazakov
CPC classification number: G06F9/3887 , G06F13/28 , G06F13/4027
Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
-
公开(公告)号:US20230195628A1
公开(公告)日:2023-06-22
申请号:US17558034
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Akhil Arunkumar , Tarun Nakra , Maxim V. Kazakov , Milind N. Nemlekar
IPC: G06F12/0811 , G06F12/0853 , G06F13/16
CPC classification number: G06F12/0811 , G06F12/0853 , G06F13/1642 , G06F13/1668
Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
-
公开(公告)号:US11656877B2
公开(公告)日:2023-05-23
申请号:US17219775
申请日:2021-03-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
CPC classification number: G06F9/3885 , G06F9/30152 , G06F9/3851 , G06F9/3869
Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.
-
公开(公告)号:US11635967B2
公开(公告)日:2023-04-25
申请号:US17032307
申请日:2020-09-25
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh Lagudu , Allen H. Rush , Michael Mantor , Arun Vaidyanathan Ananthanarayan , Prasad Nagabhushanamgari , Maxim V. Kazakov
Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
-
公开(公告)号:US20230004385A1
公开(公告)日:2023-01-05
申请号:US17364780
申请日:2021-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
Abstract: A processing device is provided which comprises a plurality of compute units configured to process data, a plurality of arithmetic logic units, instantiated separate from the plurality of compute units, and configured to store the data at the arithmetic logic units and perform calculations using the data and an interconnect network, connecting the arithmetic logic units and configured to provide the arithmetic logic units with shared access to the data for communication between the arithmetic logic units. The interconnect network is also configured to provide the compute units with shared access to the data for communication between the compute units.
-
公开(公告)号:US20220413858A1
公开(公告)日:2022-12-29
申请号:US17361118
申请日:2021-06-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Maxim V. Kazakov
IPC: G06F9/30 , G06F12/0891
Abstract: A processing device is provided which comprises memory, a plurality of registers and a processor. the processor is configured to execute a plurality of portions of a program, allocate a number of the registers per portion of the program such that a number of remaining registers are available as a register cache and transfer data between the number of registers, which are allocated per portion of the program, and the register cache. The processor loads data to the allocated registers to execute a portion of the program, stores data, resulting from execution of the portion, in the register cache, reloads the data in the allocated registers and executes another portion of the program using the data reloaded to the allocated registers and A called function uses the number of allocated registers, which is less than an architectural limit of registers allocated per portion of the program.
-
公开(公告)号:US20220309126A1
公开(公告)日:2022-09-29
申请号:US17214784
申请日:2021-03-26
Applicant: Advanced Micro Devices, Inc.
Abstract: A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the products of the pairs of elements as a sum of the exponents of the pairs of elements and performing matrix multiplication on remaining products of the pairs of elements of the first and second matrices.
-
-
-
-
-
-
-
-
-