-
公开(公告)号:US10909652B2
公开(公告)日:2021-02-02
申请号:US16355303
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Altug Koker , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Josh Mastronarde , Naveen Matam , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, higher or lower density memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
-
公开(公告)号:US10803548B2
公开(公告)日:2020-10-13
申请号:US16355377
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US20200183849A1
公开(公告)日:2020-06-11
申请号:US16702073
申请日:2019-12-03
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , David Puffer , Prasoonkumar Surti , Lakshminarayanan Striramassarma , Vasanth Ranganathan , Kiran C. Veernapu , Balaji Vembu , Pattabhiraman K
IPC: G06F12/0877 , G06T1/60 , G06F12/0868 , G06F12/0846 , G06F12/0806 , G06F12/0855 , G06F12/0802 , G06F12/126
Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20160284046A1
公开(公告)日:2016-09-29
申请号:US15083689
申请日:2016-03-29
Applicant: Intel Corporation
Inventor: Altug Koker , Lakshminarayanan Striramassarma , Akif Ali
CPC classification number: G06T1/20 , G06F15/08 , G06T1/60 , G06T11/001 , G06T15/005 , G06T15/80 , H04L49/109
Abstract: In on embodiment, a hybrid fabric interconnects multiple graphics processor cores within a processor. The hybrid fabric interconnect includes multiple data channels, including programmable virtual data channels. The virtual data channels carry multiple traffic classes of packet-based messages. The virtual data channels and multiple traffic classes may be assigned one of multiple priorities. The virtual data channels may be arbitrated independently. The hybrid fabric is scalable and can support multiple topologies, including multiple stacked integrated circuit topologies.
Abstract translation: 在实施例中,混合架构互连处理器内的多个图形处理器核心。 混合网络互连包括多个数据信道,包括可编程虚拟数据信道。 虚拟数据信道承载多个基于分组的消息的业务类别。 可以将虚拟数据信道和多个业务类别分配为多个优先级之一。 虚拟数据通道可以独立地进行仲裁。 混合架构是可扩展的,可以支持多种拓扑结构,包括多个堆叠集成电路拓扑。
-
公开(公告)号:US12293431B2
公开(公告)日:2025-05-06
申请号:US18310688
申请日:2023-05-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Valentin Andrei , Ashutosh Garg , Yoav Harel , Arthur Hunter, Jr. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli
IPC: G06T1/20 , G06F7/544 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/0806 , G06F15/80 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.
-
公开(公告)号:US20250103548A1
公开(公告)日:2025-03-27
申请号:US18948174
申请日:2024-11-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter, JR. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US20250103547A1
公开(公告)日:2025-03-27
申请号:US18906859
申请日:2024-10-04
Applicant: INTEL CORPORATION
Inventor: Prasoonkumar Surti , Subramaniam Maiyuran , Valentin Andrei , Abhishek Appu , Varghese George , Altug Koker , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , Joydeep Ray , Lakshminarayanan Striramassarma , SungYe Kim
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , H03M7/46
Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.
-
公开(公告)号:US20250103511A1
公开(公告)日:2025-03-27
申请号:US18905667
申请日:2024-10-03
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian
IPC: G06F12/123 , G06F12/0875 , G06F12/0891 , G06T1/60
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.
-
公开(公告)号:US12182062B1
公开(公告)日:2024-12-31
申请号:US17961833
申请日:2022-10-07
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayanan Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
-
公开(公告)号:US20240385975A1
公开(公告)日:2024-11-21
申请号:US18671095
申请日:2024-05-22
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Michael Macpherson , Aravindh V. Anantaraman , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Varghese George , Abhishek Appu , Prasoonkumar Surti
Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
-
-
-
-
-
-
-
-
-