-
公开(公告)号:US20220121421A1
公开(公告)日:2022-04-21
申请号:US17431034
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayana Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F7/575 , G06F7/544 , G06F9/30 , G06F9/38 , G06F12/128 , G06F12/0875 , G06F12/0866 , G06F12/0895 , G06F12/02
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, a shader core communicatively coupled to the cache memory and comprising a processing element to decompress a first data element extracted from an in-memory database in the cache memory and having a first bit length to generate a second data element having a second bit length, greater than the first bit length, and an arithmetic logic unit (ALU) to compare the data element to a target value provided in a query of the in-memory database. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11288191B1
公开(公告)日:2022-03-29
申请号:US17132147
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Hema Chand Nalluri , Aditya Navale , Altug Koker , Brandon Fliflet , Jeffery S. Boles , James Valerio , Vasanth Ranganathan , Anirban Kundu , Pattabhiraman K
IPC: G06F12/0802
Abstract: An apparatus to facilitate memory flushing is disclosed. The apparatus comprises a cache memory, one or more processing resources, tracker hardware to dispatch workloads for execution at the processing resources and to monitor the workloads to track completion of the execution, range based flush (RBF) hardware to process RBF commands and generate a flush indication to flush data from the cache memory and a flush controller to receive the flush indication and perform a flush operation to discard data from the cache memory at an address range provided in the flush indication.
-
43.
公开(公告)号:US10796472B2
公开(公告)日:2020-10-06
申请号:US16024821
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Michael Apodaca , Ankur Shah , Ben Ashbaugh , Brandon Fliflet , Hema Nalluri , Pattabhiraman K , Peter Doyle , Joseph Koston , James Valerio , Murali Ramadoss , Altug Koker , Aditya Navale , Prasoonkumar Surti , Balaji Vembu
IPC: G06T15/00
Abstract: Apparatus and method for simultaneous command streamers. For example, one embodiment of an apparatus comprises: a plurality of work element queues to store work elements for a plurality of thread contexts, each work element associated with a context descriptor identifying a context storage region in memory; a plurality of command streamers, each command streamer associated with one of the plurality of work element queues, the command streamers to independently submit instructions for execution as specified by the work elements; a thread dispatcher to evaluate the thread contexts including priority values, to tag each instruction with an execution identifier (ID), and to responsively dispatch each instruction including the execution ID in accordance with the thread context; and a plurality of graphics functional units to independently execute each instruction dispatched by the thread dispatcher and to associate each instruction with a thread context based on its execution ID.
-
44.
公开(公告)号:US20190286563A1
公开(公告)日:2019-09-19
申请号:US15922809
申请日:2018-03-15
Applicant: Intel Corporation
Inventor: Bharath Narasimha Swamy , Joydeep Ray , Rama Kishan Malladi , James Valerio , Abhishek Appu
IPC: G06F12/0855
Abstract: Apparatus and method for improved cache utilization and efficiency on a many-core processor. An apparatus comprising: a plurality of execution units to generate cache access requests responsive to executing instructions; a pending request queue to store pending cache access requests generated by the execution units; pending queue management circuitry to compare a current cache access request with entries in the pending request queue to determine whether the current cache access request can be merged with an entry in the pending request queue and, if so, to merge the current cache access request with the entry.
-
公开(公告)号:US20250036412A1
公开(公告)日:2025-01-30
申请号:US18358308
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Christopher Spencer , Jorge E. Parra Osorio , Kevin Hurd , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng
Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
-
公开(公告)号:US20250036361A1
公开(公告)日:2025-01-30
申请号:US18358304
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng
IPC: G06F7/483
Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
-
公开(公告)号:US20250004981A1
公开(公告)日:2025-01-02
申请号:US18793247
申请日:2024-08-02
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayana Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
-
公开(公告)号:US12182062B1
公开(公告)日:2024-12-31
申请号:US17961833
申请日:2022-10-07
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayanan Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
-
公开(公告)号:US20240330001A1
公开(公告)日:2024-10-03
申请号:US18620217
申请日:2024-03-28
Applicant: Intel Corporation
Inventor: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC classification number: G06F9/3887 , G06F9/355 , G06F15/7839 , G06F9/30036 , G06F9/30043
Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.
-
公开(公告)号:US20240086064A1
公开(公告)日:2024-03-14
申请号:US17944500
申请日:2022-09-14
Applicant: Intel Corporation
Inventor: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC classification number: G06F3/0604 , G06F3/0644 , G06F3/0673 , G06T1/20 , G06T1/60
Abstract: Embodiments described herein enable the offload of address calculations required to access a data element within an array of data elements from primary compute resources of a graphics processor to the memory access circuitry of the graphics processor. The memory access circuitry is configured to receive a message to access a data element of an array of data elements in the memory, the message to include an index of the data element in the array of data elements, calculate a byte address for the data element based in part on the index of the data element in the array of data elements, and submit a memory access request to the memory to access the data element at the byte address.
-
-
-
-
-
-
-
-
-