-
1.
公开(公告)号:US20200174788A1
公开(公告)日:2020-06-04
申请号:US16672203
申请日:2019-11-01
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply a first doubleword data element from the first source register with a second doubleword data element from the second source register to generate a first quadword product and to concurrently multiply a third doubleword data element from the first source register with a fourth doubleword data element from the second source register to generate a second quadword product; and a destination register to store the first quadword product and the second quadword product as first and second packed quadword data elements.
-
公开(公告)号:US20190196821A1
公开(公告)日:2019-06-27
申请号:US15850949
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY
IPC: G06F9/30
CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a right-shift instruction to generate a decoded right-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded right-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit to any bit positions exposed by the right-shifting of the first and second quadwords; the execution circuitry to cause selection of 16 most significant bits of the first and second right-shifted quadwords, including the sign bit, to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register.
-
3.
公开(公告)号:US20190196820A1
公开(公告)日:2019-06-27
申请号:US15850765
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY
IPC: G06F9/30
CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a right-shift instruction to generate a decoded right-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded right-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit to any bit positions exposed by the right-shifting of the first and second quadwords; the execution circuitry to cause selection of 32 most significant bits of the first and second right-shifted quadwords, including the sign bit, to be written to 32 least significant bit regions of first and second quadword data element locations, respectively, of a destination register.
-
4.
公开(公告)号:US20190102174A1
公开(公告)日:2019-04-04
申请号:US15721225
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL
IPC: G06F9/30
Abstract: An apparatus and method for performing dual concurrent multiplications, subtraction/addition, and accumulation of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store first and second packed data elements; a second source register to store third and fourth packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply the first and third packed data elements to generate a first temporary product and to concurrently multiply the second and fourth packed data elements to generate a second temporary product, the first through fourth packed data elements all being a first width; circuitry to negate the first temporary product to generate a negated first product; adder circuitry to add the first negated product to a first accumulated packed data element from a third source register to generate a first result, the first result being a second width which is at least twice as large as the first width; the adder circuitry to concurrently add the second temporary product to a second accumulated packed data element to generate a second result of the second width; the first and second results to be stored in specified first and second data element positions within a destination register.
-
5.
公开(公告)号:US20190102168A1
公开(公告)日:2019-04-04
申请号:US15721412
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
CPC classification number: G06F9/3001 , G06F7/00 , G06F9/30014 , G06F9/30036 , G06F9/3016
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.
-
6.
公开(公告)号:US20210294604A1
公开(公告)日:2021-09-23
申请号:US17226986
申请日:2021-04-09
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply a first doubleword data element from the first source register with a second doubleword data element from the second source register to generate a first quadword product and to concurrently multiply a third doubleword data element from the first source register with a fourth doubleword data element from the second source register to generate a second quadword product; and a destination register to store the first quadword product and the second quadword product as first and second packed quadword data elements.
-
公开(公告)号:US20190196828A1
公开(公告)日:2019-06-27
申请号:US15850248
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , CARL MURRAY , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , MILIND GIRKAR , BRET TOLL
CPC classification number: G06F9/30145 , G06F9/30101 , G06F17/16
Abstract: An apparatus and method for performing signed fractional multiplication of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction; a first source register to store a first plurality of packed signed word data elements; a second source register to store a second plurality of packed signed word data elements; a control register to store a rounding control value to indicate a rounding mode; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed signed word data elements of the first plurality with a corresponding packed signed word data element of the second plurality to generate a plurality of signed doubleword products; conversion circuitry to convert the plurality of signed doubleword products to a plurality of fractional signed words, the conversion circuitry including rounding circuitry to round the signed doubleword products in accordance with the rounding mode indicated by the rounding control value to generate the plurality of fractional signed words; and a destination register to store the plurality of fractional signed words as packed signed word fractional data elements in specified data element positions within the destination register.
-
8.
公开(公告)号:US20190196819A1
公开(公告)日:2019-06-27
申请号:US15850716
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY
IPC: G06F9/30
CPC classification number: G06F9/30032 , G06F9/3001 , G06F9/30036 , G06F9/30098 , G06F9/30145
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadword data elements, each of the packed quadword data elements including a sign bit; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry with sign preservation logic to left-shift first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, the left-shifting to generate first and second left-shifted quadwords, the shift circuitry to write zeroes into bit positions exposed by the left-shifting of the packed quadword data elements; the sign preservation logic to maintain a copy of the sign bit while the shift circuitry performs the left-shift operations; the execution circuitry to cause selection of 32 most significant bits of the first and second left-shifted quadwords, including the sign bit, to be written to 32 least significant bit regions of first and second quadword data element locations, respectively, of a destination register, writing the sign bit to the most significant bit position of each 32 least significant bit region.
-
公开(公告)号:US20190196813A1
公开(公告)日:2019-06-27
申请号:US15850499
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , JESUS CORBAL
IPC: G06F9/30
Abstract: An apparatus and method for performing multiplication, summation, negation, sign extension, and accumulation with packed bytes. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction, the instruction including an opcode, and a plurality of operands identifying a plurality of packed data source registers and a packed data destination register; a first source register to store a first plurality of packed signed bytes; a second source register to store a second plurality of packed signed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply each packed signed byte from the first source register with a corresponding packed signed byte from the second source register to generate a plurality of temporary products, adder circuitry to add a plurality of sets of the temporary products to generate a plurality of temporary sums; negation and extension circuitry to negate and extend each of the temporary sums to doublewords sums; and accumulation circuitry to add each of the doublewords sums to a doubleword from a third source register to general final doubleword results; and a packed data destination register to store the final doubleword results in specified data element locations.
-
10.
公开(公告)号:US20190196787A1
公开(公告)日:2019-06-27
申请号:US15850682
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL
CPC classification number: G06F7/5095 , G06F9/30101 , G06F9/30145
Abstract: An apparatus and method for performing sum of absolute differences with accumulation. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: adder circuitry to determine a difference between each byte in the first source register and a corresponding byte in the second source register, absolute value circuitry to determine an absolute value of each difference, the adder circuitry to add pairs of the absolute values to generate a plurality of temporary results, and extension circuitry to extend the temporary results to temporary words; and accumulator circuitry to add each temporary word to a word from a third source register to generate a plurality of accumulated words; and a destination register to store the accumulated words as packed words.
-
-
-
-
-
-
-
-
-