-
11.
公开(公告)号:US20190102190A1
公开(公告)日:2019-04-04
申请号:US15721145
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.
-
公开(公告)号:US20220326946A1
公开(公告)日:2022-10-13
申请号:US17589428
申请日:2022-01-31
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG
Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.
-
公开(公告)号:US20220171624A1
公开(公告)日:2022-06-02
申请号:US17672504
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes: multiplier circuitry to select real and imaginary data elements in the first source register and second source, multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products; adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result, and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result, combine the second temporary result with second data from the destination register to generate a second final result, and store the first final result and second final result back in the destination register.
-
14.
公开(公告)号:US20210004227A1
公开(公告)日:2021-01-07
申请号:US17027230
申请日:2020-09-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.
-
公开(公告)号:US20190196822A1
公开(公告)日:2019-06-27
申请号:US15851145
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , VENKATESWARA MADDURI
IPC: G06F9/30
CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 16 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 16 least significant bit region.
-
公开(公告)号:US20190196812A1
公开(公告)日:2019-06-27
申请号:US15850412
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , JESUS CORBAL , VENKATESWARA MADDURI
IPC: G06F9/30
Abstract: An apparatus and method for performing signed multiplication of packed signed/unsigned doublewords and accumulation with a quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; a third source register to store a plurality of packed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed doubleword data elements from the first source register with third and fourth packed doubleword data elements from the second source register, respectively, to generate first and second temporary quadword products, the multiplier circuitry to select the first, second, third, and fourth doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary quadword product with a first packed quadword value read from the third source register to generate a first accumulated quadword result and to combine the second temporary quadword product with a second packed quadword value read from the third source register to generate a second accumulated quadword result; a destination register or the third source register to store the first accumulated quadword result in a first quadword data element position and to store the second accumulated quadword result in a second quadword data element position.
-
17.
公开(公告)号:US20190102195A1
公开(公告)日:2019-04-04
申请号:US15721471
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; a third source register to store a third plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first and second source registers to multiply based on an immediate of the first instruction, the multiplier circuitry to multiply first packed data elements from the first source register with second packed data elements from the second source register in accordance with the immediate to generate a plurality of real and imaginary products, adder circuitry to select real and imaginary data elements in the third source register based on the immediate, the adder circuitry to add and subtract selected real and imaginary values from the real and imaginary products to generate first real and imaginary results; scaling, rounding, and/or saturation circuitry to scale, round, and/or saturate the first real and imaginary results to generate final real and imaginary data elements; and a destination register to store the final real and imaginary data elements in specified data element positions.
-
公开(公告)号:US20190102193A1
公开(公告)日:2019-04-04
申请号:US15721448
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result, accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.
-
19.
公开(公告)号:US20190102183A1
公开(公告)日:2019-04-04
申请号:US15721459
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products to generate a first temporary result and to add a second subset of the plurality of imaginary products to generate a second temporary result; accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.
-
-
-
-
-
-
-
-