Patent search ap:("Intel Corporation") AND inv:"Zeev Sperber" Page 13

121.

发明申请
GATHER USING INDEX ARRAY AND FINITE STATE MACHINE 审中-公开

公开(公告)号：US20170192934A1

公开(公告)日：2017-07-06

申请号：US14616323

申请日：2015-02-06

Applicant: Intel Corporation

Inventor： Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir

IPC: G06F15/80 , G06F9/38 , G06F9/30

CPC classification number: G06F15/8007 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3887

Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

122.

发明申请
FLOATING POINT (FP) ADD LOW INSTRUCTIONS FUNCTIONAL UNIT 有权

公开(公告)号：US20170185377A1

公开(公告)日：2017-06-29

申请号：US14998366

申请日：2015-12-23

Applicant: Intel Corporation

Inventor： Cristina S. Anderson , Marius A. Cornea-Hasegan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Nikita Astafev , Mark J. Charney , Milind B. Girkar , Amit Gradstein , Simon Rubanovich , Zeev Sperber

IPC: G06F7/485

CPC classification number: G06F7/485

Abstract: An example processor includes a register and an ADD low functional unit. The register stores first, second, and third floating point (FP) values. The ADD low functional unit receives a request to perform an ADD low operation and, responsive to the request: adds the first FP value with the second FP value to obtain a first sum value; rounds the first sum value to generate an ADD value; adds the first FP value with the second FP value to obtain a second sum value; subtracts the ADD value from the second sum value to generate a difference value; normalizes the difference value to obtain a normalized difference value; rounds the normalized difference value to generate an ADD low value; and sends the ADD low value to an application.

123.

发明授权
Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a same set of per-lane control bits 有权

公开(公告)号：US09672034B2

公开(公告)日：2017-06-06

申请号：US13838048

申请日：2013-03-15

Applicant: Intel Corporation

Inventor： Zeev Sperber , Robert Valentine , Benny Eitan , Doron Orenstein

IPC: G06F9/315 , G06F9/30 , G06F9/38

CPC classification number: G06F9/30032 , G06F9/30036 , G06F9/3885 , G06F9/3887

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

124.

发明申请
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY 审中-公开
Title translation: 方法，装置，说明和逻辑提供向量包装的十字形跨比较功能

公开(公告)号：US20160188336A1

公开(公告)日：2016-06-30

申请号：US14588247

申请日：2014-12-31

Applicant: Intel Corporation

Inventor： Robert Valentine , Christopher J. Hughes , Mark J. Charney , Zeev Sperber , Amit Gradstein , Simon Rubanovich , Elmoustapha Ould-Ahmed-Vall , Yuri Gebil

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30021 , G06F9/3834

Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.

Abstract translation: 指令和逻辑提供SIMD向量填充元组交叉比较功能。一些处理器实施例包括具有可变多个数据字段的第一和第二寄存器，每个数据字段用于存储第一数据类型的元素。在一些实施例中，处理器执行用于向量填充元组交叉比较的SIMD指令，对于第一寄存器的元组中的数据字段的一部分的每个数据字段，将其相应元素与数据字段的相应部分的每个元素进行比较在第二寄存器的元组中，根据相应的比较，在对应于相应的第一寄存器部分的每个未屏蔽元素的位掩码中设置对应于第二寄存器部分的每个元素的掩码位。在一些实施例中，位掩码由第三寄存器的数据字段中的相应元素移位。比较类型由即时操作数指示。

125.

发明申请
Performing Local Power Gating In A Processor 有权
Title translation: 在处理器中执行本地电源门控

公开(公告)号：US20160085287A1

公开(公告)日：2016-03-24

申请号：US14960887

申请日：2015-12-07

Applicant: Intel Corporation

Inventor： Nadav Bonen , Ron Gabor , Zeev Sperber , Vjekoslav Svilan , David N. Mackintosh , Jose A. Baiocchi Paredes , Naveen Kumar , Shantanu Gupta

IPC: G06F1/32 , G06F9/30

CPC classification number: G06F1/3243 , G06F1/3287 , G06F9/30083 , G06F9/3869 , G06F9/3885 , Y02B70/123 , Y02B70/126 , Y02D10/152 , Y02D10/171

Abstract: In an embodiment, the present invention includes an execution unit to execute instructions of a first type, a local power gate circuit coupled to the execution unit to power gate the execution unit while a second execution unit is to execute instructions of a second type, and a controller coupled to the local power gate circuit to cause it to power gate the execution unit when an instruction stream does not include the first type of instructions. Other embodiments are described and claimed.

Abstract translation: 在一个实施例中，本发明包括执行单元，用于执行第一类型的指令，耦合到执行单元的本地电源门电路，以在第二执行单元执行第二类型的指令时对所述执行单元进行电源门控，以及控制器，其耦合到所述本地电源门电路，以使得当指令流不包括所述第一类型的指令时，所述控制器对所述执行单元进行供电。描述和要求保护其他实施例。

126.

发明申请
THREAD PAUSE PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 审中-公开
Title translation: 线程暂停处理器，方法，系统和指令

公开(公告)号：US20160019063A1

公开(公告)日：2016-01-21

申请号：US14336596

申请日：2014-07-21

Applicant: Intel Corporation

Inventor： Lihu Rappoport , Zeev Sperber , Michael Mishaeli , Stanislav Shwartsman , Lev Makovsky , Adi Yoaz , Ofer Levy

IPC: G06F9/30

CPC classification number: G06F9/3851 , G06F9/30 , G06F9/30058 , G06F9/3009

Abstract: A processor of an aspect includes a decode unit to decode a thread pause instruction from a first thread. A back-end portion of the processor is coupled with the decode unit. The back-end portion of the processor, in response to the thread pause instruction, is to pause processing of subsequent instructions of the first thread for execution. The subsequent instructions occur after the thread pause instruction in program order. The back-end portion, in response to the thread pause instruction, is also to keep at least a majority of the back-end portion of the processor, empty of instructions of the first thread, except for the thread pause instruction, for a predetermined period of time. The majority may include a plurality of execution units and an instruction queue unit.

Abstract translation: 一个方面的处理器包括解码单元，用于对来自第一线程的线程暂停指令进行解码。处理器的后端部分与解码单元耦合。响应于线程暂停指令，处理器的后端部分是暂停用于执行的第一线程的后续指令的处理。随后的指令以程序顺序发生在线程暂停指令之后。响应于线程暂停指令，后端部分还将保持处理器的后端部分的至少大部分，除了线程暂停指令之外的第一线程的指令，预定的一段的时间。大多数可以包括多个执行单元和指令队列单元。

127.

发明授权
Method and apparatus for performing logical compare operations 有权

公开(公告)号：US09037626B2

公开(公告)日：2015-05-19

申请号：US13843236

申请日：2013-03-15

Applicant: Intel Corporation

Inventor： Rajiv Kapoor , Ronen Zohar , Mark J. Buxton , Zeev Sperber , Koby Gottlieb

IPC: G06F7/38 , G06F9/30 , G06F7/02

CPC classification number: G06F9/30021 , G06F7/026 , G06F9/3001 , G06F9/30029 , G06F9/30036 , G06F9/30058 , G06F9/30094 , G06F9/30098 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/3887 , G06F12/0875 , G06F2212/452

Abstract: A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.

128.

发明申请
SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE 有权
Title translation: 散射器使用索引阵列和有限状态机

公开(公告)号：US20150074373A1

公开(公告)日：2015-03-12

申请号：US13977727

申请日：2012-06-02

Applicant: INTEL CORPORATION

Inventor： Zeev Sperber , Robert Valentine , Shlomo Raikin , Stanislav Shwartsman , Gal Ofir , Igor Yanover , Guy Patkin , Levy Ofer

IPC: G06F15/78 , G06F9/30

CPC classification number: G06F15/7839 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3808 , G06F9/383

Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

Abstract translation: 公开了使用索引阵列和有限状态机进行散射/收集操作的方法和装置。设备的实施例可以包括：解码逻辑以解码散射/收集指令并产生微操作。索引数组保存一组索引和一组对应的掩码元素。有限状态机有助于散射操作。地址生成逻辑从针对具有第一值的对应掩模元素中的至少每一个的索引集合的索引生成地址。正在生成的每组地址的缓冲区中分配存储空间。与生成的地址集相对应的数据元素被复制到缓冲器。如果对应的掩码元素具有所述第一值并且掩模元素被响应于它们各自的存储的完成而被改变为第二值，则访问该集合的地址以存储数据元素。

129.

发明授权
Systems and methods to store a tile register pair to memory 有权

公开(公告)号：US12293186B2

公开(公告)日：2025-05-06

申请号：US18386407

申请日：2023-11-02

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/34 , G06F9/30

Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.

130.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS 有权

公开(公告)号：US20250004763A1

公开(公告)日：2025-01-02

申请号：US18886639

申请日：2024-09-16

Applicant: INTEL CORPORATION

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification