-
公开(公告)号:US20240289282A1
公开(公告)日:2024-08-29
申请号:US18173500
申请日:2023-02-23
Applicant: Apple Inc.
Inventor: Jonathan M. Redshaw , Winnie W. Yeung , Benjiman L. Goodman , David K. Li , Zelin Zhang , Yoong Chert Foo
IPC: G06F9/30
CPC classification number: G06F9/30079 , G06F9/30047 , G06F9/30145
Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.
-
公开(公告)号:US20240272940A1
公开(公告)日:2024-08-15
申请号:US18450978
申请日:2023-08-16
Applicant: Apple Inc.
Inventor: Benjamin Bowman , Ali Rabbani Rankouhi , Jonathan M. Redshaw , Steven Fishwick
CPC classification number: G06F9/4881 , G06F9/3887 , G06F9/485
Abstract: Disclosed techniques relate to scheduling sets of graphics work with dependencies. In some embodiments, a first set of graphics work depends on a second set of graphics work. Control circuitry may, in response to a release signal that indicates the second set reaching a first processing point, initiate processing of the first set. Control circuitry may, in response to reaching a kick gate point, stall processing of the first set. Control circuitry may, in response to an end signal for the second set, resume processing of the first set.
-
公开(公告)号:US20240054014A1
公开(公告)日:2024-02-15
申请号:US18493993
申请日:2023-10-25
Applicant: Apple Inc.
Inventor: Max J. Batley , Jonathan M. Redshaw , Ji Rao , Ali Rabbani Rankouhi
CPC classification number: G06F9/4881 , G06F9/3877 , G06F9/505 , G06F9/544 , G06F9/545 , G06F13/4282 , G06F13/36 , G06F13/37 , G06F13/4022 , G06F9/5027
Abstract: Techniques are disclosed relating to a shared control bus for communicating between primary control circuitry and multiple distributed graphics processor units. In some embodiments, a set of multiple graphics processor units including at least first and second graphics processors on different semiconductor substrates that are packaged in a multi-chip module, where the first and second graphics processors are coupled to access graphics data via respective memory interfaces. The shared workload distribution bus may include: one or more interfaces between respective graphics processors on the same semiconductor substrate and at least one cross-substrate interface between the different semiconductor substrates. Workload distribution circuitry may transmit, via the shared workload distribution bus, control data that specifies graphics work distribution to the multiple graphics processor units. Packet control circuitry may modify packets from at least one of the one or more interfaces for transmission via the cross-substrate interface.
-
公开(公告)号:US11675722B2
公开(公告)日:2023-06-13
申请号:US17337805
申请日:2021-06-03
Applicant: Apple Inc.
Inventor: Sergio Kolor , Sergio V. Tota , Tzach Zemer , Sagi Lahav , Jonathan M. Redshaw , Per H. Hammarlund , Eran Tamari , James Vash , Gaurav Garg , Lior Zimet , Harshavardhan Kaushikkar , Steven Fishwick , Steven R. Hutsell , Shawn M. Fukami
IPC: G06F13/40 , G06F15/173
CPC classification number: G06F13/4027 , G06F13/4022 , G06F15/17375 , G06F15/17381
Abstract: In an embodiment, a system on a chip (SOC) comprises a semiconductor die on which circuitry is formed, wherein the circuitry comprises a plurality of agents and a plurality of network switches coupled to the plurality of agents. The plurality of network switches are interconnected to form a plurality of physical and logically independent networks. A first network of the plurality of physically and logically independent networks is constructed according to a first topology and a second network of the plurality of physically and logically independent networks is constructed according to a second topology that is different from the first topology. For example, the first topology may a ring topology and the second topology may be a mesh topology. In an embodiment, coherency may be enforced on the first network and the second network may be a relaxed order network.
-
15.
公开(公告)号:US11436784B2
公开(公告)日:2022-09-06
申请号:US17103406
申请日:2020-11-24
Applicant: Apple Inc.
Inventor: Ali Rabbani Rankouhi , Christopher A. Burns , Justin A. Hensley , Luca Iuliano , Jonathan M. Redshaw
IPC: G06T15/06 , G06T15/00 , G06F9/48 , G06F9/50 , G06T1/20 , G06F16/22 , G06F30/31 , G06T17/10 , G16H40/67 , G06Q10/10 , G06Q50/04
Abstract: Disclosed techniques relate to primitive testing associated with ray intersection processing for ray tracing. In some embodiments, shader circuitry executes a first SIMD group that includes a ray intersect instruction for a set of rays. Ray intersect circuitry traverses, in response to the ray intersect instruction, multiple nodes in a spatially organized acceleration data structure (ADS). In response to reaching a node of the ADS that indicates one or more primitives, the apparatus forms a second SIMD group that executes one or more instructions to determine whether a set of rays that have reached the node intersect the one or more primitives. The shader circuitry may execute the first SIMD group to shade one or more primitives that are indicated as intersected based on results of execution of the second SIMD group. Thus, disclosed techniques may use both dedicated ray intersect circuitry and dynamically formed SIMD groups executed by shader processors to detect ray intersection.
-
公开(公告)号:US11373360B2
公开(公告)日:2022-06-28
申请号:US17103317
申请日:2020-11-24
Applicant: Apple Inc.
Inventor: Ali Rabbani Rankouhi , Christopher A. Burns , Justin A. Hensley , Luca Iuliano , Jonathan M. Redshaw
IPC: G06T15/06 , G06T1/20 , G06T15/00 , G06F9/48 , G06F9/50 , G06F16/22 , G06F30/31 , G06T17/10 , G16H40/67 , G06Q10/10 , G06Q50/04
Abstract: Disclosed techniques relate to grouping rays during traversal of a spatially-organized acceleration data structure (e.g., a bounding volume hierarchy) for ray intersection processing. The grouping may provide temporal locality for accesses to bounding region data. In some embodiments, ray intersect circuitry is configured to group rays based on the node of the data structure that they target next. The ray intersect circuitry may select one or more groups of rays for issuance each clock cycle, e.g., to bounding region test circuitry.
-
公开(公告)号:US11367242B2
公开(公告)日:2022-06-21
申请号:US17103433
申请日:2020-11-24
Applicant: Apple Inc.
Inventor: Ali Rabbani Rankouhi , Christopher A. Burns , Justin A. Hensley , Luca Iuliano , Jonathan M. Redshaw
Abstract: Disclosed techniques relate to ray intersection processing for ray tracing. In some embodiments, ray intersection circuitry traverses a spatially organized acceleration data structure and includes bounding region circuitry configured to test, in parallel, whether a ray intersects multiple different bounding regions indicated by a node of the data structure. Shader circuitry may execute a ray intersect instruction to invoke traversal by the ray intersect circuitry and the traversal may generate intersection results. The shader circuitry may shade intersected primitives based on the intersection results. Disclosed techniques that share processing between intersection circuitry and shader processors may improve performance, reduce power consumption, or both, relative to traditional techniques.
-
公开(公告)号:US20250094357A1
公开(公告)日:2025-03-20
申请号:US18962158
申请日:2024-11-27
Applicant: Apple Inc.
Inventor: Jonathan M. Redshaw , Winnie W. Yeung , Benjiman L. Goodman , David K. Li , Zelin Zhang , Yoong Chert Foo
IPC: G06F12/126 , G06F12/0811 , G06F12/0891
Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.
-
公开(公告)号:US11847489B2
公开(公告)日:2023-12-19
申请号:US17158943
申请日:2021-01-26
Applicant: Apple Inc.
Inventor: Max J. Batley , Jonathan M. Redshaw , Ji Rao , Ali Rabbani Rankouhi
CPC classification number: G06F9/4881 , G06F9/3877 , G06F9/505 , G06F9/544 , G06F9/545 , G06F13/36 , G06F13/37 , G06F13/4022 , G06F13/4282 , G06F9/5027
Abstract: Techniques are disclosed relating to a shared control bus for communicating between primary control circuitry and multiple distributed graphics processor units. In some embodiments, a set of multiple processor units includes first and second graphics processors, where the first and second graphics processors are coupled to access graphics data via respective memory interfaces. A shared workload distribution bus is used to transmit control data that specifies graphics work distribution to the multiple graphics processing units. The shared workload distribution bus may be arranged in a chain topology, e.g., to connect the workload distribution circuitry to the first graphics processor and connect the first graphics processor to the second graphics processor such that the workload distribution circuitry communicates with the second graphics processor via the shared workload distribution bus connection to the first graphics processor. Disclosed techniques may facilitate graphics work distribution for a scalable number of processors.
-
公开(公告)号:US20230048951A1
公开(公告)日:2023-02-16
申请号:US17399808
申请日:2021-08-11
Applicant: Apple Inc.
Inventor: Steven Fishwick , Fergus W. MacGarry , Jonathan M. Redshaw , David A. Gotwalt , Ali Rabbani Rankouhi , Benjamin Bowman
Abstract: Disclosed embodiments relate to controlling sets of graphics work (e.g., kicks) assigned to graphics processor circuitry. In some embodiments, tracking slot circuitry implements entries for multiple tracking slots. Slot manager circuitry may store, using an entry of the tracking slot circuitry, software-specified information for a set of graphics work, where the information includes: type of work, dependencies on other sets of graphics work, and location of data for the set of graphics work. The slot manager circuitry may prefetch, from the location and prior to allocating shader core resources for the set of graphics work, configuration register data for the set of graphics work. Control circuitry may program configuration registers for the set of graphics work using the prefetched data and initiate processing of the set of graphics work by the graphics processor circuitry according to the dependencies. Disclosed techniques may reduce kick-to-kick transition time, in some embodiments.
-
-
-
-
-
-
-
-
-