Patent search ap:("Intel Corporation") AND inv:"Mark A. Anders" Page 1

1.

发明授权
Instructions and logic to perform floating point and integer operations for machine learning 有权

公开(公告)号：US12217053B2

公开(公告)日：2025-02-04

申请号：US18528340

申请日：2023-12-04

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F17/16 , G06N20/00 , G06T15/00

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

2.

发明公开
MEMORY TIMING CHARACTERIZATION CIRCUITRY 审中-公开

公开(公告)号：US20240319269A1

公开(公告)日：2024-09-26

申请号：US18124338

申请日：2023-03-21

Applicant: Intel Corporation

Inventor： Amit Agarwal , Steven K. Hsu , Mark A. Anders , Ram Kumar Krishnamurthy

IPC: G01R31/317 , G01R31/3185

CPC classification number: G01R31/31725 , G01R31/31713 , G01R31/318536

Abstract: An apparatus includes a plurality of delay generators, a first plurality of flip-flop circuits, a second plurality of flip-flop circuits, and a third plurality of flip-flop circuits. The plurality of delay generators includes a data delay generator, an enable delay generator, and a reference delay generator. The first plurality of flip-flop circuits is coupled to the data delay generator to receive a delayed data input signal, and provide the delayed data input signal to a plurality of data input terminals of a memory circuit. The second plurality of flip-flop circuits is coupled to the enable delay generator to receive a delayed enable signal and provide the delayed enable signal to a plurality of enable terminals of the memory circuit. The third plurality of flip-flop circuits is coupled to an output terminal of the memory circuit. The reference delay generator provides a synchronized clock signal to the flip-flop circuits.

3.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20220357945A1

公开(公告)日：2022-11-10

申请号：US17834482

申请日：2022-06-07

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.

4.

发明申请
FLOATING POINT MULTIPLY-ACCUMULATE UNIT FOR DEEP LEARNING 有权

公开(公告)号：US20220188075A1

公开(公告)日：2022-06-16

申请号：US17688131

申请日：2022-03-07

Applicant: Intel Corporation

Inventor： Arnab Raha , Mark A. Anders , Raymond Jit-Hung Sung , Debabrata Mohapatra , Deepak Abraham Mathaikutty , Ram K. Krishnamurthy , Himanshu Kaul

IPC: G06F7/544 , G06F7/483 , G06N3/08

Abstract: A FPMAC operation has two operands: an input operand and a weight operand. The operands may have a format of FP16, BF16, or INT8. Each operand is split into two portions. The two portions are stored in separate storage units. Then operands are transferred to register files of a PE, with each register file storing bits of an operand sequentially. The PE performs the FPMAC operation based on the operands. The PE may include an FPMAC unit configured to compute an individual partial sum of the PE. The PE may also include an FP adder to accumulate the individual partial sum with other data, such as an output from another PE or an output form another PE array. The FP adder may be fused with the FPMAC unit in a single circuit that can do speculative alignment and has separate critical paths for alignment and normalization.

5.

发明申请
AREA AND ENERGY EFFICIENT MULTI-PRECISION MULTIPLY-ACCUMULATE UNIT-BASED PROCESSOR 有权

公开(公告)号：US20210397414A1

公开(公告)日：2021-12-23

申请号：US17358868

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Arnab Raha , Mark A. Anders , Martin Power , Martin Langhammer , Himanshu Kaul , Debabrata Mohapatra , Gautham Chinya , Cormac Brick , Ram Krishnamurthy

IPC: G06F7/544 , G06F7/527 , G06F5/01

Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.

6.

发明授权
Parallel direction decode circuits for network-on-chip 有权

公开(公告)号：US09866476B2

公开(公告)日：2018-01-09

申请号：US14574106

申请日：2014-12-17

Applicant: Intel Corporation

Inventor： Mark A. Anders , Gregory K. Chen , Himanshu Kaul

IPC: H04L12/771 , H04L12/801 , H04L12/933 , H04L12/721 , H04L12/773 , H04L12/947

CPC classification number: H04L45/72 , H04L45/16 , H04L45/60 , H04L47/33 , H04L49/109 , H04L49/25

Abstract: A first packet and a first direction associated with the first packet are received. The first packet is forwarded to an output port of a plurality of output ports of the first router based on the first direction associated with the first packet. A second direction associated with the first packet is determined. The second direction is based at least on an address of the first packet. The first packet and the second direction are forwarded through the output port of the first router to a second router.

7.

发明授权
Spatially divided circuit-switched channels for a network-on-chip 有权

公开(公告)号：US09680765B2

公开(公告)日：2017-06-13

申请号：US14574258

申请日：2014-12-17

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Gregory K. Chen , Mark A. Anders

IPC: H04L12/58 , H04L12/913 , H04L12/933 , H04L12/935

CPC classification number: H04L47/724 , G06F15/7825 , H04L12/50 , H04L12/6402 , H04L12/66 , H04L49/109 , H04L49/3018 , H04L49/3027

Abstract: An apparatus may comprise a plurality of ports and a plurality of channel reservation banks. A channel reservation bank is to be associated with a port of the plurality of ports. The channel reservation bank is to comprise a plurality of channel reservation slots. The port of the plurality of ports is to comprise a plurality of circuit-switched channels through the port. The configuration of each of the plurality of circuit-switched channels to be based on information stored in a channel reservation slot of the channel reservation bank to be associated with the port.

8.

发明授权
Scalable crossbar apparatus and method for arranging crossbar circuits 有权
Title translation: 用于布置横梁电路的可伸缩横梁装置和方法

公开(公告)号：US09577634B2

公开(公告)日：2017-02-21

申请号：US14751060

申请日：2015-06-25

Applicant: Intel Corporation

Inventor： Gregory K. Chen , Mark A. Anders , Himanshu Kaul

IPC: H03K19/173 , H03K19/00 , H03K19/177

CPC classification number: H03K19/0008 , H03K19/17704 , H03K19/17744

Abstract: Described is an apparatus (e.g., a router) which comprises: multiple ports; and a plurality of crossbar circuits arranged such that at least one crossbar circuit receives all interconnects associated with a data bit of the multiple ports and is operable to re-route signals on those interconnects.

Abstract translation: 描述了一种装置（例如，路由器），其包括：多个端口; 以及布置成使得至少一个交叉电路接收与多个端口的数据位相关联的所有互连并且可操作以在那些互连上重新路由信号的多个交叉电路电路。

9.

发明申请
PARALLEL DIRECTION DECODE CIRCUITS FOR NETWORK-ON-CHIP 有权
Title translation: 并行线路解码电路

公开(公告)号：US20160182367A1

公开(公告)日：2016-06-23

申请号：US14574106

申请日：2014-12-17

Applicant: Intel Corporation

Inventor： Mark A. Anders , Gregory K. Chen , Himanshu Kaul

IPC: H04L12/721 , H04L12/773 , H04L12/801 , H04L12/933

CPC classification number: H04L45/72 , H04L45/16 , H04L45/60 , H04L47/33 , H04L49/109 , H04L49/25

Abstract: A first packet and a first direction associated with the first packet are received. The first packet is forwarded to an output port of a plurality of output ports of the first router based on the first direction associated with the first packet. A second direction associated with the first packet is determined. The second direction is based at least on an address of the first packet. The first packet and the second direction are forwarded through the output port of the first router to a second router.

Abstract translation: 接收与第一分组相关联的第一分组和第一方向。基于与第一分组相关联的第一方向，第一分组被转发到第一路由器的多个输出端口的输出端口。确定与第一分组相关联的第二方向。第二方向至少基于第一分组的地址。第一分组和第二方向通过第一路由器的输出端口转发到第二路由器。

10.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20250094170A1

公开(公告)日：2025-03-20

申请号：US18901027

申请日：2024-09-30

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06F1/16 , G06F7/483 , G06F7/544 , G06F9/38 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N20/00 , G06T15/00 , G09G5/393

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification