-
公开(公告)号:DE112016004351T5
公开(公告)日:2018-06-07
申请号:DE112016004351
申请日:2016-08-24
Applicant: INTEL CORP
Inventor: MISHRA ASIT K , GROCHOWSKI EDWARD T , PEARCE JONATHAN D , MARR DEBORAH T , COHEN EHUD , OULD-AHMED-VALL ELMOUSTAPHA , CORBAL SAN ADRIAN JESUS , VALENTINE ROBERT , CHARNEY MARK J , HUGHES CHRISTOPHER J , GIRKAR MILIND B
Abstract: Ein Prozessor enthält eine Decodiereinheit, um einen Befehl zu decodieren, der einen ersten gepackten Quelldatenoperanden angibt, der wenigstens vier Datenelemente enthält, einen zweiten gepackten Quelldatenoperanden angibt, der wenigstens vier Datenelemente enthält, und einen oder mehrere Zielspeicherorte angibt. Die Ausführungseinheit speichert in Reaktion auf den Befehl wenigstens einen Ergebnismaskenoperanden an dem (den) Zielspeicherort(en). Der wenigstens eine Ergebnismaskenoperand enthält für jedes entsprechende Datenelement in einem des ersten und des zweiten gepackten Quelldatenoperanden an derselben relativen Position ein anderes Maskenelement. Jedes Maskenelement gibt an, ob das entsprechende Datenelement in dem einen der gepackten Quelldatenoperanden gleich irgendeinem der Datenelemente in dem anderen der gepackten Quelldatenoperanden ist.
-
112.
公开(公告)号:SG11201704466QA
公开(公告)日:2017-07-28
申请号:SG11201704466Q
申请日:2015-12-14
Applicant: INTEL CORP
Inventor: VALENTINE ROBERT , HUGHES CHRISTOPHER J , CHARNEY MARK J , SPERBER ZEEV , GRADSTEIN AMIT , RUBANOVICH SIMON , GEBIL YURI , OULD-AHMED-VALL ELMOUSTAPHA
Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.
-
公开(公告)号:GB2514885B
公开(公告)日:2015-10-28
申请号:GB201404575
申请日:2014-03-14
Applicant: INTEL CORP
Inventor: OULD-AHMED-VALL ELMOUSTAPHA , VALENTINE ROBERT
IPC: G06F9/30
-
公开(公告)号:GB2507655B
公开(公告)日:2015-06-24
申请号:GB201318167
申请日:2013-10-14
Applicant: INTEL CORP
Inventor: ULIEL TAL , OULD-AHMED-VALL ELMOUSTAPHA , VALENTINE ROBERT
IPC: G06F9/30
Abstract: Instructions and logic provide vector compress and rotate functionality. A processor may include a mask register, a decoder, and an execution unit. The mask register may include a data field, wherein the data field corresponds to an element location in a vector. The decoder may be coupled to the mask register. The decoder may decode an instruction to obtain a decoded instruction. The decoded instruction may specify a vector source, the mask register, a vector destination, and a vector destination offset location. The execution unit is coupled to the decoder. The execution unit may read an unmasked value in the data field; copy an vector element from the vector source to a location adjacent to the element; change the unmasked value to a masked value; determine that the vector destination is full; store a vector destination operand associated with the vector destination in a memory; and re-execute the instruction using the masked value and the vector destination offset location.
-
公开(公告)号:GB2513970A
公开(公告)日:2014-11-12
申请号:GB201403976
申请日:2014-03-06
Applicant: INTEL CORP
Inventor: VALENTINE ROBERT , OULD-AHMED-VALL ELMOUSTAPHA
IPC: G06F9/30
Abstract: A processor comprises a plurality of packed data registers 207, and an execution unit 209 coupled to the registers that, in response to a limited range vector memory access instruction 203; accesses memory locations in only a restricted or limited range 220 of memory 210. The limited range vector memory access instruction indicates source packed memory indices 213 which include a plurality of 8-bit and/or 16-bit memory indices, and these indices specify the memory locations to be accessed. The registers might also include source packed data (to be stored in the limited memory in response to scatter operations); or destination storage locations for packed data (to load data in response to the load operations). The instruction might also indicate a source packed data operation mask 216 to prevent some of the bits from being stored or loaded.
-
116.
公开(公告)号:GB2513467A
公开(公告)日:2014-10-29
申请号:GB201403993
申请日:2014-03-06
Applicant: INTEL CORP
Inventor: HUGHES CHRISTOPHER J , CHARNEY MARK J , CORBAL JESUS , GIRKAR MILIND B , OULD-AHMED-VALL ELMOUSTAPHA , TOLL BRET L , VALENTINE ROBERT
IPC: G06F9/30
Abstract: A zero mask before trailing (i.e. least significant) zero (KZBTZ) instruction is decoded and executed to find a least significant zero bit position in an first input mask 301 and sets an output mask 302 to have the values of the first input mask, but with all bit positions closer to the most significant bit position than the least significant zero bit position in the first input mask set to zero. In some embodiments, a second input mask 303 is used to determine which bit positions of the first input mask are considered in the least significant zero bit position calculation depending upon there being a 1 in a corresponding bit position in the second input mask. The masks are writemask operands and may be stored in write mask registers or general purpose registers.
-
">117.
公开(公告)号:DE102014003696A1
公开(公告)日:2014-09-18
申请号:DE102014003696
申请日:2014-03-13
Applicant: INTEL CORP
Inventor: ALBREKHT ILYA , OULD-AHMED-VALL ELMOUSTAPHA
IPC: G06F9/302
Abstract: Es werden Systeme, Verfahren und Vorrichtungen zur Berechnung eines Quadrats eines Datenwerts eines ersten Quellenoperanden, eines Quadrats eines Datenwerts eines zweiten Quellenoperanden und einer Multiplikation der Daten des ersten und des zweiten Operanden unter Verwendung nur einer Multiplikation beschrieben.
-
公开(公告)号:GB2511198A
公开(公告)日:2014-08-27
申请号:GB201323062
申请日:2013-12-27
Applicant: INTEL CORP
Inventor: ULIEL TAL , OULD-AHMED-VALL ELMOUSTAPHA , TOLL BRET L
Abstract: SIMD vectorisation of conditional loops is provided. A vector of counts is initialized (1610) to n count values, the vector having n data fields to store elements having a partition size of m bytes (e.g. 4-byte double words); a decision vector is obtained (1620) and used to generate a mask (1630); a vector expand instruction is received (1640), which has the count vector as a source, uses the generated mask and specifies a destination vector of n elements, each having a size of m bytes; the instruction causes the copying of consecutive source vector data into unmasked destination vector elements (1620). n varies according to the received instruction. Masked elements of the destination vector are set to zero. Counts of the condition decisions are also stored (1660, 1670). One application is processing loops in benchmark suites for online clustering based on finding medians to assign points to their nearest centre.
-
119.
公开(公告)号:DE112011105664T5
公开(公告)日:2014-08-21
申请号:DE112011105664
申请日:2011-09-26
Applicant: INTEL CORP
Inventor: DOSHI KSHITIJ A , YOUNT CHARLES R , OULD-AHMED-VALL ELMOUSTAPHA , SAIR SULEYMAN
Abstract: Instruktionen und Logik stellen eine Vektorstreuungs-Op- und/oder -Hol-Op-Funktionalität bereit. In einigen Ausführungsformen lesen Ausführungseinheiten in Reaktion auf eine Instruktion, die eine Hol- und eine zweite Operation, ein Zielregister, ein Operandenregister und eine Speicheradresse spezifiziert, Werte in einem Maskenregister, wobei Felder in dem Maskenregister Versatzindizes in dem Indizesregister für Datenelemente im Speicher entsprechen. Ein erster Maskenwert gibt an, dass das Element nicht aus dem Speicher geholt wurde, und ein zweiter Wert gibt an, dass das Element nicht geholt zu werden braucht oder bereits geholt wurde. Für jedes mit dem ersten Wert wird das Datenelement aus dem Speicher in die entsprechende Zielregisterposition geholt, und der entsprechende Wert in dem Maskenregister wird zu dem zweiten Wert geändert. Wenn alle Maskenregisterfelder den zweiten Wert haben, so wird die zweite Operation unter Verwendung entsprechender Daten in den Zielort- und Operandenregistern ausgeführt, um Ergebnisse zu generieren.
-
公开(公告)号:GB2502936A
公开(公告)日:2013-12-11
申请号:GB201317902
申请日:2011-09-30
Applicant: INTEL CORP
Inventor: VALENTINE ROBERT , ADRIAN JESUS CORBAL SAN , SANS ROGER ESPASA , CAVIN ROBERT D , TOLL BRET L , DURAN SANTIAGO GALAN , WIEDEMEIER JEFFREY , SAMUDRALA SRIDHAR , GIRKAR MILIND BABURAO , GROCHOWSKI EDWARD THOMAS , HALL JONATHAN CANNON , BRADFORD DENNIS R , OULD-AHMED-VALL ELMOUSTAPHA , ABEL JAMES C , CHARNEY MARK J , ABRAHAM SETH , SAIR SULEYMAN , FORSYTH ANDREW THOMAS , YOUNT CHARLES , WU LISA K
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
-
-
-
-
-
-
-
-
-