Patent search ap:("INTEL CORPORATION") AND inv:"Milind B. Girkar" Page 1

1.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS 有权

公开(公告)号：US20230048998A1

公开(公告)日：2023-02-16

申请号：US17964964

申请日：2022-10-13

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

2.

发明授权
Systems, methods, and apparatuses for tile store 有权

公开(公告)号：US11288069B2

公开(公告)日：2022-03-29

申请号：US16487755

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert Valentine , Menachem Adelman , Elmoustapha Ould-Ahmed-Vall , Bret L. Toll , Milind B. Girkar , Zeev Sperber , Mark J. Charney , Rinat Rappoport , Jesus Corbal , Stanislav Shwartsman , Igor Yanover , Alexander F. Heinecke , Barukh Ziv , Dan Baum , Yuri Gebil

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F17/16 , G06F7/76 , G06F9/38

Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information

3.

发明授权
Systems, apparatuses, and methods for fused multiply add 有权

公开(公告)号：US11169802B2

公开(公告)日：2021-11-09

申请号：US16338324

申请日：2016-10-20

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: In some embodiments, packed data elements of first and second packed data source operands are of a first, different size than a second size of packed data elements of a third packed data operand. Execution circuitry executes decoded single instruction to perform, for each packed data element position of a destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

4.

发明申请
Providing Multiple Memory Modes For A Processor Including Internal Memory 审中-公开

公开(公告)号：US20190286559A1

公开(公告)日：2019-09-19

申请号：US16433671

申请日：2019-06-06

Applicant: Intel Corporation

Inventor： Avinash Sodani , Robert J. Kyanko , Richard J. Greco , Andreas Kleen , Milind B. Girkar , Christopher M. Cantalupo

IPC: G06F12/06 , G06F12/02

Abstract: In one embodiment, a processor comprises: at least one core formed on a die to execute instructions; a first memory controller to interface with an in-package memory; a second memory controller to interface with a platform memory to couple to the processor; and the in-package memory located within a package of the processor, where the in-package memory is to be identified as a more distant memory with respect to the at least one core than the platform memory. Other embodiments are described and claimed.

5.

发明授权
Architectural register replacement for instructions that use multiple architectural registers 有权

公开(公告)号：US10255072B2

公开(公告)日：2019-04-09

申请号：US15201310

申请日：2016-07-01

Applicant: Intel Corporation

Inventor： Mark J. Charney , Robert Valentine , Milind B. Girkar , Ashish Jha , Bret L. Toll , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal San Adrian , Jason W. Brandt

IPC: G06F9/30

Abstract: A processor of an aspect includes a decode unit to decode an instruction. The instruction is to explicitly specify a first architectural register and is to implicitly indicate at least a second architectural register. The second architectural register is implicitly to be at a higher register number than the first architectural register. The processor also includes an architectural register replacement unit coupled with the decode unit. The architectural register replacement unit is to replace the first architectural register with a third architectural register, and is to replace the second architectural register with a fourth architectural register. The third architectural register is to be at a lower register number than the first architectural register. The fourth architectural register is to be at a lower register number than the second architectural register. Other processors are also disclosed, as are methods and systems.

6.

发明授权
Systems, apparatuses, and methods for data speculation execution 有权

公开(公告)号：US10061583B2

公开(公告)日：2018-08-28

申请号：US14582776

申请日：2014-12-24

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Christopher J. Hughes , Robert Valentine , Milind B. Girkar

IPC: G06F9/30 , G06F9/34 , G06F9/38 , G06F9/46

CPC classification number: G06F9/3016 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30087 , G06F9/30098 , G06F9/34 , G06F9/3824 , G06F9/3834 , G06F9/3842 , G06F9/3861 , G06F9/467

Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction to reset data speculative execution (DSX) tracking hardware to track speculative memory accesses, clear a DSX status indication in a DSX status register, and commit all speculatively executed stores of the DSX region and thereby end a DSX region.

7.

发明申请
GENERATING VECTOR BASED SELECTION CONTROL STATEMENTS 审中-公开

公开(公告)号：US20180181404A1

公开(公告)日：2018-06-28

申请号：US15391915

申请日：2016-12-28

Applicant: Intel Corporation

Inventor： Hideki Saito Ido , Eric N. Garcia , Xinmin Tian , Milind B. Girkar , James Brodman

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3844 , G06F9/30058 , G06F9/3806 , G06F15/76

Abstract: In one example, a system for generating vector based selection control statements can include a processor to determine a vector cost of the selection control statement is below a scalar cost and determine the selection control statement is to be executed in a sorted order based on dependencies between branch instructions of the selection control statement. The processor can also determine a program ordering of labels of the selection control statement does not match a mathematical ordering of the labels and execute the selection control statement with a vector of values, wherein the selection control statement is to be executed based on a jump table and a sorted unique value technique, wherein the sorted unique value technique comprises selecting at least one of the plurality of branch instructions from the jump table.

8.

发明申请
Fused Multiply-Add (FMA) low functional unit 有权

公开(公告)号：US20170185379A1

公开(公告)日：2017-06-29

申请号：US14757942

申请日：2015-12-23

Applicant: Intel Corporation

Inventor： Cristina S. Anderson , Marius A. Cornea-Hasegan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Nikita Astafev , Mark J. Charney , Milind B. Girkar , Amit Gradstein , Simon Rubanovich , Zeev Sperber

IPC: G06F7/487 , G06F7/499 , G06F7/485

CPC classification number: G06F7/4876 , G06F7/485 , G06F7/49915

Abstract: An example processor includes a register and a fused multiply-add (FMA) low functional unit. The register stores first, second, and third floating point (FP) values. The FMA low functional unit receives a request to perform an FMA low operation: multiplies the first FP value with the second FP value to obtain a first product value; adds the first product with the third FP value to generate a first result value; rounds the first result to generate a first FMA value; multiplies the first FP value with the second FP value to obtain a second product value; adds the second product value with the third FP value to generate a second result value; and subtracts the FMA value from the second result value to obtain a third result value, which can then be normalized and rounded (FMA low result) and sent the FMA low result to an application.

9.

发明授权
Vector address conflict resolution with vector population count functionality 有权
Title translation: 矢量地址冲突解决与矢量人口计数功能

公开(公告)号：US09411592B2

公开(公告)日：2016-08-09

申请号：US13731005

申请日：2012-12-29

Applicant: INTEL CORPORATION

Inventor： Robert Valentine , Mark J. Charney , Jesus Corbal , Milind B. Girkar , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Brett L. Toll

IPC: G06F9/30 , G06F9/38 , G06F7/60 , H03M7/20

CPC classification number: G06F9/30145 , G06F7/607 , G06F9/30014 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3836 , G06F9/3887 , H03M7/20

Abstract: Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.

Abstract translation: 指令和逻辑提供SIMD地址冲突解决与向量群体计数功能。一些实施例包括具有可变多个数据字段的寄存器的处理器，每个数据字段用于存储可变的第二多个位。目的地寄存器具有对应的数据字段，这些数据字段中的每一个用于存储为相应的数据字段设置为1的位数的计数。响应于对向量群体计数指令进行解码，执行单元对寄存器中的每个数据字段设置为1的位数进行计数，并将计数存储在第一目的地寄存器的相应数据字段中。矢量人口计数指令可用于可变大小的元素和冲突掩码，以生成迭代计数和完成掩码，以便在每次迭代中使用以解决聚集修改散射SIMD操作中的依赖关系。

10.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS 有权

公开(公告)号：US20250004763A1

公开(公告)日：2025-01-02

申请号：US18886639

申请日：2024-09-16

Applicant: INTEL CORPORATION

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification