Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Joseph Lee Greathouse"

1.

发明申请
Super-Temporal Cache Replacement Policy 有权

公开(公告)号：US20240411692A1

公开(公告)日：2024-12-12

申请号：US18332112

申请日：2023-06-09

Applicant: Advanced Micro Devices, Inc.

Inventor： Gabriel Hsiuwei Loh , Joseph Lee Greathouse , William Louie Walker , Paul James Moyer

IPC: G06F12/0802

Abstract: Cache replacement policies are described. In accordance with the described techniques, a request for data is received and a cache replacement policy controls how a controller responds to the request. The cache replacement policy assigns each cacheline a priority value, which indicates whether the cacheline should be preserved relative to other cachelines, in response to the request being a cache miss that necessitates eviction of at least one cacheline. The cache replacement policy decrements priority values until at least one cacheline achieves a minimum priority value, at which point a cacheline is evicted. The cache replacement policy designates certain cachelines as protected, either via a separate protected indicator or via the cacheline's priority value, which causes unprotected cachelines to be selected for eviction while favoring preservation of protected cachelines in the cache.

2.

发明授权
Register compaction with early release 有权

公开(公告)号：US12033238B2

公开(公告)日：2024-07-09

申请号：US17030852

申请日：2020-09-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Brian D. Emberling , Joseph Lee Greathouse , Anthony Thomas Gutierrez

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/60 , G06T1/20

Abstract: Systems, apparatuses, and methods for implementing register compaction with early release are disclosed. A processor includes at least a command processor, a plurality of compute units, a plurality of registers, and a control unit. Registers are statically allocated to wavefronts by the control unit when wavefronts are launched by the command processor on the compute units. In response to determining that a first set of registers, previously allocated to a first wavefront, are no longer needed, the first wavefront executes an instruction to release the first set of registers. The control unit detects the executed instruction and releases the first set of registers to the available pool of registers to potentially be used by other wavefronts. Then, the control unit can allocate the first set of registers to a second wavefront for use by threads of the second wavefront while the first wavefront is still active.

3.

发明申请
ALLREDUCE ENHANCED DIRECT MEMORY ACCESS FUNCTIONALITY 有权

公开(公告)号：US20210406209A1

公开(公告)日：2021-12-30

申请号：US17032195

申请日：2020-09-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhinav Vishnu , Joseph Lee Greathouse

IPC: G06F13/28

Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.

4.

发明授权
Enforcing central processing unit quality of service guarantees when servicing accelerator requests 有权

公开(公告)号：US11275613B2

公开(公告)日：2022-03-15

申请号：US15954382

申请日：2018-04-16

Applicant: Advanced Micro Devices, Inc.

Inventor： Arkaprava Basu , Joseph Lee Greathouse

IPC: G06F9/46 , G06F9/48

Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.

5.

发明授权
High-performance sparse triangular solve on graphics processing units 有权

公开(公告)号：US10691772B2

公开(公告)日：2020-06-23

申请号：US15958265

申请日：2018-04-20

Applicant: Advanced Micro Devices, Inc.

Inventor： Joseph Lee Greathouse

IPC: G06F17/16 , G06F9/48

Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.

6.

发明申请
DISTRIBUTED MULTI-INPUT MULTI-OUTPUT CONTROL THEORETIC METHOD TO MANAGE HETEROGENEOUS SYSTEMS 审中-公开

公开(公告)号：US20190317461A1

公开(公告)日：2019-10-17

申请号：US15950172

申请日：2018-04-11

Applicant: Advanced Micro Devices, Inc.

Inventor： Raghavendra Pradyumna Pothukuchi , Joseph Lee Greathouse , Leonardo De Paula Rosa Piga

IPC: G05B15/02

Abstract: A processing unit includes a plurality of subsystem control modules. Each subsystem control module includes a set of one or more inputs that receives a set of one or more external signals and a set of one or more monitored outputs from a hardware subsystem corresponding to the subsystem control module, and a set of configuration outputs for controlling one or more configuration settings of the hardware subsystem. The subsystem control module determines the one or more configuration settings based on the set of monitored outputs and on one or more targets derived from the set of external signals.

7.

发明申请
REGISTER COMPACTION WITH EARLY RELEASE 有权

公开(公告)号：US20220092725A1

公开(公告)日：2022-03-24

申请号：US17030852

申请日：2020-09-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Brian D. Emberling , Joseph Lee Greathouse , Anthony Thomas Gutierrez

IPC: G06T1/60 , G06T1/20

Abstract: Systems, apparatuses, and methods for implementing register compaction with early release are disclosed. A processor includes at least a command processor, a plurality of compute units, a plurality of registers, and a control unit. Registers are statically allocated to wavefronts by the control unit when wavefronts are launched by the command processor on the compute units. In response to determining that a first set of registers, previously allocated to a first wavefront, are no longer needed, the first wavefront executes an instruction to release the first set of registers. The control unit detects the executed instruction and releases the first set of registers to the available pool of registers to potentially be used by other wavefronts. Then, the control unit can allocate the first set of registers to a second wavefront for use by threads of the second wavefront while the first wavefront is still active.

8.

发明申请
FAMILY OF LOSSY SPARSE LOAD SIMD INSTRUCTIONS 审中-公开

公开(公告)号：US20200159529A1

公开(公告)日：2020-05-21

申请号：US16194981

申请日：2018-11-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Sanchari Sen , Derrick Allen Aguren , Joseph Lee Greathouse

IPC: G06F9/30 , G06F17/16 , G06F9/38 , G06N3/04

Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.

9.

发明申请
ENFORCING CENTRAL PROCESSING UNIT QUALITY OF SERVICE GUARANTEES WHEN SERVICING ACCELERATOR REQUESTS 审中-公开

公开(公告)号：US20190317807A1

公开(公告)日：2019-10-17

申请号：US15954382

申请日：2018-04-16

Applicant: Advanced Micro Devices, Inc.

Inventor： Arkaprava Basu , Joseph Lee Greathouse

IPC: G06F9/48 , G06F9/46

Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.

10.

发明授权
User-level hardware branch records 有权
Title translation: 用户级硬件分支记录

公开(公告)号：US09372773B2

公开(公告)日：2016-06-21

申请号：US13916417

申请日：2013-06-12

Applicant: Advanced Micro Devices, Inc.

Inventor： Joseph Lee Greathouse , Anton Chernoff

IPC: G06F9/44 , G06F11/30 , G06F9/30

CPC classification number: G06F11/30 , G06F9/3005 , G06F11/3003 , G06F11/3471 , G06F11/3476 , G06F11/3648 , G06F2201/865

Abstract: A processor, a method and a computer-readable medium for recording branch addresses are provided. The processor comprises hardware registers and first and second circuitry. The first circuitry is configured to store a first address associated with a branch instruction in the hardware registers. The first circuitry is further configured to store a second address that indicates where the processor execution is redirected to as a result of the branch instruction in the hardware registers. The second circuitry is configured to, in response to a second instruction, retrieve a value of at least one of the registers. The second instruction can be a user-level instruction.

Abstract translation: 提供了用于记录分支地址的处理器，方法和计算机可读介质。处理器包括硬件寄存器和第一和第二电路。第一电路被配置为在硬件寄存器中存储与分支指令相关联的第一地址。第一电路还被配置为作为硬件寄存器中的分支指令的结果存储指示处理器执行被重定向到哪里的第二地址。第二电路被配置为响应于第二指令检索至少一个寄存器的值。第二条指令可以是用户级指令。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification