-
公开(公告)号:US20240411692A1
公开(公告)日:2024-12-12
申请号:US18332112
申请日:2023-06-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel Hsiuwei Loh , Joseph Lee Greathouse , William Louie Walker , Paul James Moyer
IPC: G06F12/0802
Abstract: Cache replacement policies are described. In accordance with the described techniques, a request for data is received and a cache replacement policy controls how a controller responds to the request. The cache replacement policy assigns each cacheline a priority value, which indicates whether the cacheline should be preserved relative to other cachelines, in response to the request being a cache miss that necessitates eviction of at least one cacheline. The cache replacement policy decrements priority values until at least one cacheline achieves a minimum priority value, at which point a cacheline is evicted. The cache replacement policy designates certain cachelines as protected, either via a separate protected indicator or via the cacheline's priority value, which causes unprotected cachelines to be selected for eviction while favoring preservation of protected cachelines in the cache.
-
公开(公告)号:US12033238B2
公开(公告)日:2024-07-09
申请号:US17030852
申请日:2020-09-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Brian D. Emberling , Joseph Lee Greathouse , Anthony Thomas Gutierrez
Abstract: Systems, apparatuses, and methods for implementing register compaction with early release are disclosed. A processor includes at least a command processor, a plurality of compute units, a plurality of registers, and a control unit. Registers are statically allocated to wavefronts by the control unit when wavefronts are launched by the command processor on the compute units. In response to determining that a first set of registers, previously allocated to a first wavefront, are no longer needed, the first wavefront executes an instruction to release the first set of registers. The control unit detects the executed instruction and releases the first set of registers to the available pool of registers to potentially be used by other wavefronts. Then, the control unit can allocate the first set of registers to a second wavefront for use by threads of the second wavefront while the first wavefront is still active.
-
公开(公告)号:US20210406209A1
公开(公告)日:2021-12-30
申请号:US17032195
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhinav Vishnu , Joseph Lee Greathouse
IPC: G06F13/28
Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.
-
4.
公开(公告)号:US11275613B2
公开(公告)日:2022-03-15
申请号:US15954382
申请日:2018-04-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Arkaprava Basu , Joseph Lee Greathouse
Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.
-
公开(公告)号:US10691772B2
公开(公告)日:2020-06-23
申请号:US15958265
申请日:2018-04-20
Applicant: Advanced Micro Devices, Inc.
Inventor: Joseph Lee Greathouse
Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.
-
6.
公开(公告)号:US20190317461A1
公开(公告)日:2019-10-17
申请号:US15950172
申请日:2018-04-11
Applicant: Advanced Micro Devices, Inc.
IPC: G05B15/02
Abstract: A processing unit includes a plurality of subsystem control modules. Each subsystem control module includes a set of one or more inputs that receives a set of one or more external signals and a set of one or more monitored outputs from a hardware subsystem corresponding to the subsystem control module, and a set of configuration outputs for controlling one or more configuration settings of the hardware subsystem. The subsystem control module determines the one or more configuration settings based on the set of monitored outputs and on one or more targets derived from the set of external signals.
-
公开(公告)号:US20220092725A1
公开(公告)日:2022-03-24
申请号:US17030852
申请日:2020-09-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Brian D. Emberling , Joseph Lee Greathouse , Anthony Thomas Gutierrez
Abstract: Systems, apparatuses, and methods for implementing register compaction with early release are disclosed. A processor includes at least a command processor, a plurality of compute units, a plurality of registers, and a control unit. Registers are statically allocated to wavefronts by the control unit when wavefronts are launched by the command processor on the compute units. In response to determining that a first set of registers, previously allocated to a first wavefront, are no longer needed, the first wavefront executes an instruction to release the first set of registers. The control unit detects the executed instruction and releases the first set of registers to the available pool of registers to potentially be used by other wavefronts. Then, the control unit can allocate the first set of registers to a second wavefront for use by threads of the second wavefront while the first wavefront is still active.
-
公开(公告)号:US20200159529A1
公开(公告)日:2020-05-21
申请号:US16194981
申请日:2018-11-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Sanchari Sen , Derrick Allen Aguren , Joseph Lee Greathouse
Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.
-
9.
公开(公告)号:US20190317807A1
公开(公告)日:2019-10-17
申请号:US15954382
申请日:2018-04-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Arkaprava Basu , Joseph Lee Greathouse
Abstract: Systems, apparatuses, and methods for enforcing processor quality of service guarantees when servicing system service requests (SSRs) are disclosed. A system includes a first processor executing an operating system and a second processor executing an application which generates SSRs for the first processor to service. The first processor monitors the number of cycles spent servicing SSRs over a previous time interval, and if this number of cycles is above a threshold, the first processor starts delaying the servicing of subsequent SSRs. In one implementation, if the previous delay was non-zero, the first processor increases the delay used in the servicing of subsequent SSRs. If the number of cycles is less than or equal to the threshold, then the first processor services SSRs without delay. As the delay is increased, the second processor begins to stall and its SSR generation rate falls, reducing the load on the first processor.
-
公开(公告)号:US09372773B2
公开(公告)日:2016-06-21
申请号:US13916417
申请日:2013-06-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Joseph Lee Greathouse , Anton Chernoff
CPC classification number: G06F11/30 , G06F9/3005 , G06F11/3003 , G06F11/3471 , G06F11/3476 , G06F11/3648 , G06F2201/865
Abstract: A processor, a method and a computer-readable medium for recording branch addresses are provided. The processor comprises hardware registers and first and second circuitry. The first circuitry is configured to store a first address associated with a branch instruction in the hardware registers. The first circuitry is further configured to store a second address that indicates where the processor execution is redirected to as a result of the branch instruction in the hardware registers. The second circuitry is configured to, in response to a second instruction, retrieve a value of at least one of the registers. The second instruction can be a user-level instruction.
Abstract translation: 提供了用于记录分支地址的处理器,方法和计算机可读介质。 处理器包括硬件寄存器和第一和第二电路。 第一电路被配置为在硬件寄存器中存储与分支指令相关联的第一地址。 第一电路还被配置为作为硬件寄存器中的分支指令的结果存储指示处理器执行被重定向到哪里的第二地址。 第二电路被配置为响应于第二指令检索至少一个寄存器的值。 第二条指令可以是用户级指令。
-
-
-
-
-
-
-
-
-