Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Maxim V. Kazakov"

21.

发明申请
DIRECT-CONNECTED MACHINE LEARNING ACCELERATOR 有权

公开(公告)号：US20250086515A1

公开(公告)日：2025-03-13

申请号：US18954763

申请日：2024-11-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov

IPC: G06N20/00 , G06F9/30 , G06F9/50 , G06F13/40

Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.

22.

发明授权
Techniques for reducing serialization in divergent control flow 有权

公开(公告)号：US12014208B2

公开(公告)日：2024-06-18

申请号：US16023897

申请日：2018-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Skyler Jonathon Saleh , Maxim V. Kazakov

IPC: G06F9/46 , G06F9/38 , G06F9/48 , G06F9/52 , G06T15/00 , G06T15/80

CPC classification number: G06F9/4881 , G06F9/3887 , G06T15/005 , G06T15/80 , G06F9/4843 , G06F9/52

Abstract: Techniques for executing shader programs with divergent control flow on a single instruction multiple data (“SIMD”) processor are disclosed. These techniques includes detecting entry into a divergent section of a shader program and, for the work-items that enter the divergent section, placing a task entry into a task queue associated with the target of each work-item. The target is the destination, in code, of any particular work-item, and is also referred to as a code segment herein. The task queues store task entries for code segments generated by different (or the same) wavefronts. A command processor examines task lists and schedules wavefronts for execution by grouping together tasks in the same task list into wavefronts and launching those wavefronts. By grouping tasks from different wavefronts together for execution in the same front, serialization of execution is greatly reduced or eliminated.

23.

发明公开
WAVEFRONT SELECTION AND EXECUTION 审中-公开

公开(公告)号：US20230266975A1

公开(公告)日：2023-08-24

申请号：US18309536

申请日：2023-04-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3885 , G06F9/3869 , G06F9/3851 , G06F9/30152

Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.

24.

发明申请
PROCESSING DEVICE AND METHOD OF SHARING STORAGE BETWEEN CACHE MEMORY, LOCAL DATA STORAGE AND REGISTER FILES 有权

公开(公告)号：US20230069890A1

公开(公告)日：2023-03-09

申请号：US17467104

申请日：2021-09-03

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov

IPC: G06F13/16 , G06F9/30 , G06F9/38 , G06F12/0875

Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.

25.

发明申请
MULTI-ACCELERATOR COMPUTE DISPATCH 有权

公开(公告)号：US20220319089A1

公开(公告)日：2022-10-06

申请号：US17218421

申请日：2021-03-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak

IPC: G06T15/00 , G06T15/80 , G06F9/54

Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

26.

发明申请
DATA COMPRESSOR FOR APPROXIMATION OF MATRICES FOR MATRIX MULTIPLY OPERATIONS 有权

公开(公告)号：US20220309125A1

公开(公告)日：2022-09-29

申请号：US17214779

申请日：2021-03-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Swapnil P. Sakharshete , Pramod Vasant Argade , Maxim V. Kazakov , Alexander M. Potapov

IPC: G06F17/16 , G06F7/544 , G06F7/523 , H03K19/173

Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.

27.

发明申请
STACKED DIES FOR MACHINE LEARNING ACCELERATOR 有权

公开(公告)号：US20210374607A1

公开(公告)日：2021-12-02

申请号：US17129739

申请日：2020-12-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov , Swapnil P. Sakharshete , Milind N. Nemlekar , Vineet Goel

IPC: G06N20/00 , G06F13/40 , G06F13/28 , G06F13/16 , G06T15/00

Abstract: A device is disclosed. The device includes a machine learning die including a memory and one or more machine learning accelerators; and a processing core die stacked with the machine learning die, the processing core die being configured to execute shader programs for controlling operations on the machine learning die, wherein the memory is configurable as either or both of a cache and a directly accessible memory.

28.

发明授权
Matrix multiplier with submatrix sequencing 有权

公开(公告)号：US11093580B2

公开(公告)日：2021-08-17

申请号：US16176449

申请日：2018-10-31

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Maxim V. Kazakov , Jian Mao

IPC: G06F17/16

Abstract: A processor sequences the application of submatrices at a matrix multiplier to reduce the number of input changes at an input register of the matrix multiplier. The matrix multiplier is configured to perform a matrix multiplication for a relatively small matrix. To multiply two larger matrices the GPU decomposes the larger matrices into smaller submatrices and stores the submatrices at input registers of the matrix multiplier in a sequence, thereby calculating each column of a result matrix. The GPU sequences the storage of the submatrices at the input registers to maintain input data at one of the input registers over multiple calculation cycles of the matrix multiplier thereby reducing power consumption at the GPU.

29.

发明授权
Texture residency checks using compression metadata 有权

公开(公告)号：US10783694B2

公开(公告)日：2020-09-22

申请号：US15687108

申请日：2017-08-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov , Skyler J. Saleh , Ruijin Wu , Sagar Shankar Bhandare

IPC: G06T15/00 , G06T15/04 , G06T15/80 , G06T9/00 , G06T1/60 , G06T17/20

Abstract: A pipeline is configured to access a memory that stores a texture block and metadata that encodes compression parameters of the texture block and a residency status of the texture block. A processor requests access to the metadata in conjunction with requesting data in the texture block to perform a shading operation. The pipeline selectively returns the data in the texture block to the processor depending on whether the metadata indicates that the texture block is resident in the memory. A cache can also be included to store a copy of the metadata that encodes the compression parameters of the texture block. The residency status and the metadata stored in the cache can be modified in response to requests to access the metadata stored in the cache.

30.

发明授权
Wave creation control with dynamic resource allocation 有权

公开(公告)号：US10558499B2

公开(公告)日：2020-02-11

申请号：US15794593

申请日：2017-10-26

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Maxim V. Kazakov , Michael Mantor

IPC: G06F9/50

Abstract: Footprints, or resource allocations, of waves within resources that are shared by processor cores in a multithreaded processor are measured concurrently with the waves executing on the processor cores. The footprints are averaged over a time interval. A number of waves are spawned and dispatched for execution in the multithreaded processor based on the average footprint. In some cases, the waves are spawned at a rate that is determined based on the average value of the footprints of waves within the resources. The rate of spawning waves is modified in response to a change in the average value of the footprints of the waves within the resources.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification