Patent search ap:("Intel Corporation") AND inv:"Dipankar Das" Page 2

11.

发明公开
COMMUNICATION OPTIMIZATIONS FOR DISTRIBUTED MACHINE LEARNING 审中-公开

公开(公告)号：US20230376762A1

公开(公告)日：2023-11-23

申请号：US18320385

申请日：2023-05-19

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das , Chandrasekaran Sakthivel , Mikhail E. Smorkalov

IPC: G06N3/08 , G06N3/088 , G06F9/50 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/04 , G06N3/063

CPC classification number: G06N3/08 , G06N3/088 , G06F9/5061 , G06F9/50 , G06F9/5077 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/04 , G06N3/063 , G06N3/048

Abstract: Embodiments described herein provide an apparatus comprising an interconnect switch configured to couple with a plurality of graphics processors via a plurality of point-to-point interconnects and one or more processors including a graphics processor coupled with the interconnect switch via a point-to-point interconnect of the plurality of point-to-point interconnects.

12.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US11507815B2

公开(公告)日：2022-11-22

申请号：US17742138

申请日：2022-05-11

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dipankar Das

IPC: G06T1/20 , G06F5/01 , G06N3/063 , G06F7/487 , G06F7/544 , G06N3/04 , G06N3/08

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.

13.

发明授权
Instructions for fused multiply-add operations with variable precision input operands 有权

公开(公告)号：US11321086B2

公开(公告)日：2022-05-03

申请号：US16735381

申请日：2020-01-06

Applicant: Intel Corporation

Inventor： Dipankar Das , Naveen K. Mellempudi , Mrinmay Dutta , Arun Kumar , Dheevatsa Mudigere , Abhisek Kundu

IPC: G06F9/30 , G06F7/544 , G06F9/38 , G06N3/063 , G06F7/483

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

14.

发明授权
Communication optimizations for distributed machine learning 有权

公开(公告)号：US11270201B2

公开(公告)日：2022-03-08

申请号：US15859180

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das , Chandrasekaran Sakthivel , Mikhail E. Smorkalov

IPC: G06N3/08 , G06F9/50 , G06N3/04 , G06N3/063 , G06N7/00

Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library, the instructions to cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network.

15.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 有权

公开(公告)号：US20210019631A1

公开(公告)日：2021-01-21

申请号：US16983107

申请日：2020-08-03

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F17/16 , G06F9/30

Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

16.

发明申请
CIRCUITRY FOR LOW-PRECISION DEEP LEARNING 审中-公开

公开(公告)号：US20190042939A1

公开(公告)日：2019-02-07

申请号：US15994930

申请日：2018-05-31

Applicant: Intel Corporation

Inventor： Martin Langhammer , Sudarshan Srinivasan , Gregg William Baeckler , Duncan Moss , Sasikanth Avancha , Dipankar Das

IPC: G06N3/08 , G06N3/04 , G06N3/063 , G06F5/01 , G06F17/16 , G06F7/501

Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.

17.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 审中-公开

公开(公告)号：US20180322390A1

公开(公告)日：2018-11-08

申请号：US15869564

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06F17/16 , G06N3/04 , G06N3/063

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.

18.

发明申请
HARDWARE IMPLEMENTED POINT TO POINT COMMUNICATION PRIMITIVES FOR MACHINE LEARNING 审中-公开

公开(公告)号：US20180322387A1

公开(公告)日：2018-11-08

申请号：US15869510

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das

IPC: G06N3/08 , G06F9/54 , G06N3/04 , G06N3/063

CPC classification number: G06N3/08 , G06F9/547 , G06N3/04 , G06N3/063

Abstract: One embodiment provides for a system to compute and distribute data for distributed training of a neural network, the system including first memory to store a first set of instructions including a machine learning framework; a fabric interface to enable transmission and receipt of data associated with the set of trainable machine learning parameters; a first set of general-purpose processor cores to execute the first set of instructions, the first set of instructions to provide a training workflow for computation of gradients for the trainable machine learning parameters and to communicate with a second set of instructions, the second set of instructions facilitate transmission and receipt of the gradients via the fabric interface; and a graphics processor to perform compute operations associated with the training workflow to generate the gradients for the trainable machine learning parameters.

19.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11669329B2

公开(公告)日：2023-06-06

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3802 , G06F9/3001 , G06F9/30018 , G06F9/30145

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

20.

发明申请
DATA PARALLELISM AND HALO EXCHANGE FOR DISTRIBUTED MACHINE LEARNING 有权

公开(公告)号：US20220366526A1

公开(公告)日：2022-11-17

申请号：US17849968

申请日：2022-06-27

Applicant: Intel Corporation

Inventor： Dipankar Das , KARTHIKEYAN VAIDYANATHAN , Srinivas Sridharan

IPC: G06T1/20 , G06T1/60 , G06N3/08 , G06N3/063 , G06N3/04

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification