-
公开(公告)号:US20210318874A1
公开(公告)日:2021-10-14
申请号:US17240882
申请日:2021-04-26
Applicant: INTEL CORPORATION
Inventor: Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
2.
公开(公告)号:US20210216315A1
公开(公告)日:2021-07-15
申请号:US17152160
申请日:2021-01-19
Applicant: INTEL CORPORATION
Inventor: Bret TOLL , Alexander F. HEINECKE , Christopher J. HUGHES , Ronen ZOHAR , Michael ESPIG , Dan BAUM , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL
Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
-
公开(公告)号:US20210096822A1
公开(公告)日:2021-04-01
申请号:US17121155
申请日:2020-12-14
Applicant: INTEL CORPORATION
Inventor: Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Bret TOLL , Jesus CORBAL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL
IPC: G06F7/78 , G06F9/30 , G06F15/173 , G06F9/38
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
-
公开(公告)号:US20240045690A1
公开(公告)日:2024-02-08
申请号:US18460497
申请日:2023-09-01
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
CPC classification number: G06F9/30178 , G06F9/30145 , G06F9/30036 , G06F9/3013 , G06F9/3802
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
5.
公开(公告)号:US20210216323A1
公开(公告)日:2021-07-15
申请号:US17216635
申请日:2021-03-29
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
-
公开(公告)号:US20200073635A1
公开(公告)日:2020-03-05
申请号:US16613529
申请日:2017-06-29
Applicant: Intel Corporation
Inventor: Venkateswara R. MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark J. CHARNEY , Carl MURRAY , Milind GIRKAR , Bret TOLL
IPC: G06F7/499
Abstract: Embodiments of systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection in a processor are described. For example, execution circuitry executes a decoded instruction to perform a fractional multiplication operation for each of a plurality of pairs of packed data elements to yield a plurality of output values, round each of the plurality of output values, detect whether any of the plurality of output values reflect an overflow or underflow, for any of the plurality of output values that reflect an overflow or underflow, saturate the output value, and store the plurality of output values into a corresponding plurality of positions of the packed data destination operand.
-
公开(公告)号:US20190042257A1
公开(公告)日:2019-02-07
申请号:US16144902
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20190042202A1
公开(公告)日:2019-02-07
申请号:US16144889
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Bret TOLL , Jesus CORBAL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL
IPC: G06F7/78 , G06F9/30 , G06F9/38 , G06F15/173
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
-
公开(公告)号:US20240061700A1
公开(公告)日:2024-02-22
申请号:US18239489
申请日:2023-08-29
Applicant: INTEL CORPORATION
Inventor: Rajesh SANKARAN , Bret TOLL , William RASH , Subramaniam MAIYURAN , Gang CHEN , Varghese GEORGE
IPC: G06F9/455 , G06F12/1009 , G06T1/20
CPC classification number: G06F9/45558 , G06F12/1009 , G06T1/20 , G06F2009/4557 , G06F2009/45583 , G06F2009/45591
Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.
-
公开(公告)号:US20220171627A1
公开(公告)日:2022-06-02
申请号:US17672253
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
-
-
-
-
-
-
-
-