Patent search ap:("Apple Inc.") AND inv:"Kulin N. Kothari" Page 1

1.

发明授权
Load/store ordering violation management 有权

公开(公告)号：US10983801B2

公开(公告)日：2021-04-20

申请号：US16562675

申请日：2019-09-06

Applicant: Apple Inc.

Inventor： Kulin N. Kothari , Mridul Agarwal

IPC: G06F9/38 , G06F9/30

Abstract: A processor includes a load/store unit that includes one or more load pipelines and one or more store pipelines. Load operations may be issued into the load pipelines out of order with respect to older store operations. If a load operation is executed out or order with an older store operation that writes one or more bytes read by the load operation, and if the store operation is issued shortly after the load operation, such that the load operation is still in the load pipeline when the store operation is issued, some cases of flushing may be converted to replays by detecting the ordering violation while the load operation is still in the load pipeline.

2.

发明授权
Load/store dependency predictor optimization for replayed loads 有权

公开(公告)号：US10437595B1

公开(公告)日：2019-10-08

申请号：US15070435

申请日：2016-03-15

Applicant: Apple Inc.

Inventor： Pradeep Kanapathipillai , Stephan G. Meier , Gerard R. Williams, III , Mridul Agarwal , Kulin N. Kothari

IPC: G06F9/38 , G06F9/30

Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.

3.

发明申请
Processing of Data Synchronization Barrier Instructions 有权

公开(公告)号：US20250147767A1

公开(公告)日：2025-05-08

申请号：US19009790

申请日：2025-01-03

Applicant: Apple Inc.

Inventor： Madhu Sudan Hari , Mridul Agarwal , Kulin N. Kothari , John D. Pape , Niket K. Choudhary

IPC: G06F9/38 , G06F9/30 , G06F9/52 , G06F12/1027

Abstract: A system may include multiple processors. One of the processors may receive an indication of a data synchronization barrier (DSB) instruction in another processor that follows a translation look-ahead buffer invalidate (TLBI) instruction to invalidate an entry of a translation look-ahead buffer. The processor may determine whether instructions are pending in the processor for which the virtual addresses used for memory accesses have been translated to physical addresses before receiving the DSB indication. If there are such pending instructions, the processor may provide, after these instructions retire, an indication to the other processor as a response to the DSB indication.

4.

发明公开
Conditional Instructions Distribution and Execution 审中-公开

公开(公告)号：US20230244495A1

公开(公告)日：2023-08-03

申请号：US17590722

申请日：2022-02-01

Applicant: Apple Inc.

Inventor： Ethan R. Schuchman , Niket K. Choudhary , Kulin N. Kothari , Haoyan Jia , Ian D. Kountanis , Douglas C. Holman , Wei-Han Lien , Pruthivi Vuyyuru

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3844 , G06F9/3861 , G06F9/30145 , G06F9/30058 , G06F9/30079

Abstract: A processor may include an instruction distribution circuit and a plurality of execution pipelines. The instruction distribution circuit may distribute a conditional instruction to a first execution pipeline for execution when the conditional instruction is associated with a prediction of a high confidence level, or to a second execution pipeline for execution when the conditional instruction is associated with a prediction of a low confidence level. The second execution pipeline, not the first execution pipeline, may directly instruct the processor to obtain an instruction from a target address for execution, when the conditional instruction is mispredicted. Thus, when the conditional instruction is distributed to the first execution pipeline for execution and determined to be mispredicted, the first execution pipeline may cause the conditional instruction to be re-executed in the second execution pipeline to cause the instruction from the correct target address to be obtained for execution.

5.

发明申请
ZERO CYCLE LOAD BYPASS 有权

公开(公告)号：US20210173654A1

公开(公告)日：2021-06-10

申请号：US16705023

申请日：2019-12-05

Applicant: Apple Inc.

Inventor： Deepankar Duggal , Kulin N. Kothari , Conrado Blasco , Muawya M. Al-Otoom

IPC: G06F9/38 , G06F9/30

Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.

6.

发明授权
Program counter zero-cycle loads 有权

公开(公告)号：US12288070B1

公开(公告)日：2025-04-29

申请号：US17931070

申请日：2022-09-09

Applicant: Apple Inc.

Inventor： Muawya M. Al-Otoom , Conrado Blasco , Deepankar Duggal , Ethan R. Schuchman , Ian D. Kountanis , Kulin N. Kothari , Nikhil Gupta

IPC: G06F9/30 , G06F9/38

Abstract: An apparatus includes a processor core that includes an instruction decode circuit and a control circuit. The instruction decode circuit is configured to decode instructions, including a plurality of store instructions used to store information in a memory hierarchy. The control circuit is configured, after a particular store instruction is decoded, to preserve store information related to the particular store instruction, including a first program counter value for the particular store instruction. In response to decoding a subsequent load instruction with a corresponding second program counter value, the control circuit is configured to determine, using the first and second program counter values, whether a dependency has been established between the subsequent load instruction and the particular store instruction. In response to a determination that the dependency has been established, the control circuit is configured to use the preserved store information to perform the subsequent load instruction.

7.

发明授权
Out of order store commit 有权

公开(公告)号：US10228951B1

公开(公告)日：2019-03-12

申请号：US14831661

申请日：2015-08-20

Applicant: Apple Inc.

Inventor： Kulin N. Kothari , Mridul Agarwal , Pradeep Kanapathipillai

IPC: G06F9/38 , G06F9/30

Abstract: Systems, apparatuses, and methods for committing store instructions out of order from a store queue are described. A processor may store a first store instruction and a second store instruction in the store queue, wherein the first store instruction is older than the second store instruction. In response to determining the second store instruction is ready to commit to the memory hierarchy, the processor may allow the second store instruction to commit before the first store instruction, in response to determining that all store instructions in the store queue older than the second store instruction are non-speculative. However, if it is determined that at least one store instruction in the store queue older than the second store instruction is speculative, the processor may prevent the second store instruction from committing to the memory hierarchy before the first store instruction.

8.

发明授权
Hierarchical store queue circuit 有权

公开(公告)号：US12298915B1

公开(公告)日：2025-05-13

申请号：US18359755

申请日：2023-07-26

Applicant: Apple Inc.

Inventor： Nikhil Gupta , Gideon N. Levinsky , Kulin N. Kothari , Mridul Agarwal , Pankaj Lnu

IPC: G06F12/123 , G06F9/30

Abstract: An apparatus includes a cache memory circuit, and a hierarchal store queue circuit that further includes a primary queue and a secondary queue. The hierarchal store queue circuit may be configured to write incoming store requests to the primary queue in response to the primary queue currently having capacity, and to write incoming store requests to the secondary queue in response to the primary queue currently not having capacity. The hierarchal store queue circuit may be further configured to commit store requests to the cache memory circuit from the primary queue but not from the secondary queue. In response to a determination that the primary queue currently has capacity, the hierarchal store queue circuit may perform a transfer of at least one store request from the secondary queue to the primary queue.

9.

发明授权
Early load execution via constant address and stride prediction 有权

公开(公告)号：US11829763B2

公开(公告)日：2023-11-28

申请号：US16539684

申请日：2019-08-13

Applicant: Apple Inc.

Inventor： Yuan C. Chou , Viney Gautam , Wei-Han Lien , Kulin N. Kothari , Mridul Agarwal

IPC: G06F9/345 , G06F9/38 , G06F9/30 , G06F9/50 , G06F12/0802

CPC classification number: G06F9/3455 , G06F9/30043 , G06F9/3861 , G06F9/5005 , G06F12/0802

Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.

10.

发明授权
Zero cycle load bypass in a decode group 有权

公开(公告)号：US11416254B2

公开(公告)日：2022-08-16

申请号：US16705023

申请日：2019-12-05

Applicant: Apple Inc.

Inventor： Deepankar Duggal , Kulin N. Kothari , Conrado Blasco , Muawya M. Al-Otoom

IPC: G06F9/30 , G06F9/38

Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification