Patent search ap:("Intel Corporation") AND inv:"Zeev Sperber" Page 3

21.

发明授权
Accelerator systems and methods for matrix operations 有权

公开(公告)号：US10942738B2

公开(公告)日：2021-03-09

申请号：US16368973

申请日：2019-03-29

Applicant: Intel Corporation

Inventor： Zeev Sperber , Amit Gradstein , Simon Rubanovich , Igor Yanover , Gavri Berger , Eyal Hadas , Saeed Kharouf , Ron Schneider , Sagi Meller , Jose Yallouz

IPC: G06F9/30 , G06F9/38

Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.

22.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS 审中-公开

公开(公告)号：US20200310802A1

公开(公告)日：2020-10-01

申请号：US16370459

申请日：2019-03-29

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

23.

发明授权
Automatic predication of hard-to-predict convergent branches 有权

公开(公告)号：US10754655B2

公开(公告)日：2020-08-25

申请号：US16021838

申请日：2018-06-28

Applicant: Intel Corporation

Inventor： Adarsh Chauhan , Hong Wang , Jayesh Gaur , Zeev Sperber , Sumeet Bandishte , Lihu Rappoport , Stanislav Shwartsman , Kamil Garifullin , Sreenivas Subramoney , Adi Yoaz

IPC: G06F9/32 , G06F9/42 , G06F9/38 , G06F9/30

Abstract: A processing device includes a branch IP table and branch predication circuitry coupled to the branch IP table. The branch predication circuitry to: determine a dynamic convergence point in a conditional branch of set of instructions; store the dynamic convergence point in the branch IP table; fetch a first and second speculative path of the conditional branch; while determining which of the first speculative path and the second speculative path is a taken path of the conditional branch and determining whether a dynamic convergence point is fetched corresponding to the stored dynamic convergence point, stall scheduling of instructions of the first speculative path and the second speculative path; and in response to determining that one of the first speculative path and the second speculative path is the taken path and the fetched dynamic convergence point corresponds to the stored convergence point, resume scheduling of the instructions of the taken path.

24.

发明授权
Apparatus and method of improved packed integer permute instruction 有权

公开(公告)号：US10719316B2

公开(公告)日：2020-07-21

申请号：US15808800

申请日：2017-11-09

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC: G06F9/30

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

25.

发明授权
Multiply add functional unit capable of executing scale, round, getexp, round, getmant, reduce, range and class instructions 有权

公开(公告)号：US10649733B2

公开(公告)日：2020-05-12

申请号：US16436901

申请日：2019-06-10

Applicant: Intel Corporation

Inventor： Cristina S. Anderson , Zeev Sperber , Simon Rubanovich , Benny Eitan , Amit Gradstein

IPC: G06F7/57 , G06F7/483 , G06F7/499 , G06F7/544 , G06F9/38 , G06F9/30 , G06F5/01

Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.

26.

发明授权
Apparatus and method of improved insert instructions 有权

公开(公告)号：US10459728B2

公开(公告)日：2019-10-29

申请号：US15809721

申请日：2017-11-10

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC: G06F9/30 , G06F12/06 , G06F9/38

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

27.

发明授权
Accelerating memory fault resolution by performing fast re-fetching 有权

公开(公告)号：US10402263B2

公开(公告)日：2019-09-03

申请号：US15831195

申请日：2017-12-04

Applicant: Intel Corporation

Inventor： Zeev Sperber , Stanislav Shwartsman , Jared W. Stark, IV , Lihu Rappoport , Igor Yanover , George Leifman

IPC: G06F11/00 , G06F11/07 , G06F12/02 , G06F9/38 , G06F9/30

Abstract: A method for handling load faults in an out-of-order processor is described. The method includes detecting, by a memory ordering buffer of the out-of-order processor, a load fault corresponding to a load instruction that was executed out-of-order by the out-of-order processor; determining, by the memory ordering buffer, whether instant reclamation is available for resolving the load fault of the load instruction; and performing, in response to determining that instant reclamation is available for resolving the load fault of the load instruction, instant reclamation to re-fetch the load instruction for execution prior to attempting to retire the load instruction.

28.

发明授权
Instruction and logic for early underflow detection and rounder bypass 有权

公开(公告)号：US10157059B2

公开(公告)日：2018-12-18

申请号：US15280324

申请日：2016-09-29

Applicant: Intel Corporation

Inventor： Simon Rubanovich , Thierry Pons , Zeev Sperber , Amit Gradstein

IPC: G06F7/499 , G06F9/30 , G06F7/00

Abstract: A processor for floating point underflow detection includes circuitry to decode a first instruction and a floating point unit. The decoded instruction, when executed by the processor, may be for performing a fused multiply-add (FMA) operation. The floating point unit includes circuitry to determine a non-normalized result of the first instruction based on a first input, a second input, and a third input. The floating point unit further includes circuitry to determine whether underflow exists in the non-normalized result based on a first exponent of the first input, a second exponent of the second input, and a third exponent of the third input.

29.

发明授权
Scatter using index array and finite state machine 有权

公开(公告)号：US10152451B2

公开(公告)日：2018-12-11

申请号：US15490743

申请日：2017-04-18

Applicant: Intel Corporation

Inventor： Zeev Sperber , Robert Valentine , Shlomo Raikin , Stanislav Shwartsman , Gal Ofir , Igor Yanover , Guy Patkin , Ofer Levy

IPC: G06F9/00 , G06F15/78 , G06F9/30 , G06F9/345 , G06F9/38

Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

30.

发明授权
Gather using index array and finite state machine 有权

公开(公告)号：US10146737B2

公开(公告)日：2018-12-04

申请号：US14616323

申请日：2015-02-06

Applicant: Intel Corporation

Inventor： Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir

IPC: G06F9/00 , G06F15/80 , G06F9/30 , G06F9/38

Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification