-
51.
公开(公告)号:US20190163474A1
公开(公告)日:2019-05-30
申请号:US15824339
申请日:2017-11-28
Applicant: Intel Corporation
Inventor: Robert Valentine , Mark Charney , Raanan Sade , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal
IPC: G06F9/30
Abstract: An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.
-
52.
公开(公告)号:US20190102198A1
公开(公告)日:2019-04-04
申请号:US15721616
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for multiplication and accumulation of signed data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected signed data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first signed result values, sum the plurality of first signed result values to generate one or more second signed result values, accumulate the one or more signed result values with one or more data values from a destination operand to generate one or more third signed result values, and store the one or more third signed result values in one or more packed data element positions in the destination operand.
-
53.
公开(公告)号:US20190102185A1
公开(公告)日:2019-04-04
申请号:US15721599
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
Abstract: Embodiments of systems, apparatuses, and methods for multiplication, negation, and accumulation of data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first result values, sum the plurality of first result values to generate one or more second result values, negate the one or more second result values to generate one or more third result values, accumulate the one or more third result values with one or more data values from the destination operand to generate one or more fourth result values, and store the one or more third result values in one or more packed data element positions in the destination operand.
-
公开(公告)号:US12099838B2
公开(公告)日:2024-09-24
申请号:US17132464
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Deepti Aggarwal , Michael Espig , Chekib Nouira , Robert Valentine , Mark Charney
CPC classification number: G06F9/3001 , G06F9/3802 , G06F9/3818 , G06F17/16 , G06F17/18
Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of squared differences (SSD) instruction; a decode circuit to decode the SSD instruction; and an execution circuit to, during an execution of the decoded SSD instruction, generate an SSD output vector based on a plurality of input vectors, the SSD output vector including a plurality of squared differences values. Other embodiments are described and claimed.
-
公开(公告)号:US12086595B2
公开(公告)日:2024-09-10
申请号:US17214853
申请日:2021-03-27
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Amit Gradstein , Daniel Towner , Mark Charney
IPC: G06F9/30
CPC classification number: G06F9/3016 , G06F9/30025 , G06F9/30098
Abstract: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.
-
56.
公开(公告)号:US11966742B2
公开(公告)日:2024-04-23
申请号:US18311810
申请日:2023-05-03
Applicant: Intel Corporation
Inventor: Eliezer Weissmann , Mark Charney , Michael Mishaeli , Robert Valentine , Itai Ravid , Jason W. Brandt , Gilbert Neiger , Baruch Chaikin , Efraim Rotem
CPC classification number: G06F9/3851 , G06F9/30043 , G06F9/30076 , G06F9/30101 , G06F9/3836 , G06F9/3842
Abstract: Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
-
公开(公告)号:US20240045677A1
公开(公告)日:2024-02-08
申请号:US17958378
申请日:2022-10-01
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine
IPC: G06F9/30
CPC classification number: G06F9/30025 , G06F9/3016
Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.
-
58.
公开(公告)号:US11809867B2
公开(公告)日:2023-11-07
申请号:US17027230
申请日:2020-09-21
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Mark Charney , Robert Valentine , Binwei Yang
CPC classification number: G06F9/3001 , G06F7/00 , G06F9/30014 , G06F9/3016 , G06F9/30036
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.
-
59.
公开(公告)号:US11573799B2
公开(公告)日:2023-02-07
申请号:US17226986
申请日:2021-04-09
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Mark Charney , Robert Valentine , Jesus Corbal , Binwei Yang
IPC: G06F9/30
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply a first doubleword data element from the first source register with a second doubleword data element from the second source register to generate a first quadword product and to concurrently multiply a third doubleword data element from the first source register with a fourth doubleword data element from the second source register to generate a second quadword product; and a destination register to store the first quadword product and the second quadword product as first and second packed quadword data elements.
-
60.
公开(公告)号:US10802826B2
公开(公告)日:2020-10-13
申请号:US15721412
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Mark Charney , Robert Valentine , Binwei Yang
IPC: G06F9/30
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.
-
-
-
-
-
-
-
-
-