-
公开(公告)号:US11023235B2
公开(公告)日:2021-06-01
申请号:US15858947
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman , Eyal Hadas
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.
-
42.
公开(公告)号:US10896043B2
公开(公告)日:2021-01-19
申请号:US16146854
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Bret Toll , Alexander F. Heinecke , Christopher J. Hughes , Ronen Zohar , Michael Espig , Dan Baum , Raanan Sade , Robert Valentine , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall
IPC: G06F17/16 , G06F12/02 , G06F9/30 , G06F12/06 , G06F9/38 , G06T1/20 , G06F12/0897 , G06F12/0875 , G06F9/345
Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
-
公开(公告)号:US10719323B2
公开(公告)日:2020-07-21
申请号:US16144902
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20190042540A1
公开(公告)日:2019-02-07
申请号:US15859268
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
-
公开(公告)号:US10163468B2
公开(公告)日:2018-12-25
申请号:US15855618
申请日:2017-12-27
Applicant: Intel Corporation
Inventor: Glenn Hinton , Bret Toll , Ronak Singhal
Abstract: A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
-
公开(公告)号:US10153011B2
公开(公告)日:2018-12-11
申请号:US15855585
申请日:2017-12-27
Applicant: Intel Corporation
Inventor: Glenn Hinton , Bret Toll , Ronak Singhal
Abstract: A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
-
公开(公告)号:US12293186B2
公开(公告)日:2025-05-06
申请号:US18386407
申请日:2023-11-02
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
-
公开(公告)号:US12265826B2
公开(公告)日:2025-04-01
申请号:US18399014
申请日:2023-12-28
Applicant: Intel Corporation
Inventor: Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
公开(公告)号:US11748130B2
公开(公告)日:2023-09-05
申请号:US16456300
申请日:2019-06-28
Applicant: INTEL CORPORATION
Inventor: Rajesh Sankaran , Bret Toll , William Rash , Subramaniam Maiyuran , Gang Chen , Varghese George
IPC: G06F9/455 , G06F12/1009 , G06T1/20
CPC classification number: G06F9/45558 , G06F12/1009 , G06T1/20 , G06F2009/4557 , G06F2009/45583 , G06F2009/45591
Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.
-
公开(公告)号:US20230229446A1
公开(公告)日:2023-07-20
申请号:US18186710
申请日:2023-03-20
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30043
Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
-
-
-
-
-
-
-
-
-