Patent search ap:("Intel Corporation") AND inv:"Dipankar Das" Page 3

21.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US11501139B2

公开(公告)日：2022-11-15

申请号：US15869582

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dipankar Das

IPC: G06T1/20 , G06F5/01 , G06N3/063 , G06F7/487 , G06F7/544 , G06N3/04 , G06N3/08

Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.

22.

发明申请
COMMUNICATION OPTIMIZATIONS FOR DISTRIBUTED MACHINE LEARNING 有权

公开(公告)号：US20220245454A1

公开(公告)日：2022-08-04

申请号：US17685462

申请日：2022-03-03

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das , Chandrasekaran Sakthivel , Mikhail E. Smorkalov

IPC: G06N3/08 , G06F9/50 , G06N3/04 , G06N3/063

Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library. The instructions cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network. The processor can transparently adjust communication paths between worker nodes based on the communication pattern.

23.

发明授权
Dynamic precision management for integer deep learning primitives 有权

公开(公告)号：US11321805B2

公开(公告)日：2022-05-03

申请号：US17083588

申请日：2020-10-29

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan

IPC: G06T1/20 , G06N3/063 , G06F17/16 , G06F7/523 , G06F5/01 , G06F7/501 , G06F17/15 , G06N3/04 , G06F7/544 , G06N3/08

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

24.

发明授权
Abstraction library to enable scalable distributed machine learning 有权

公开(公告)号：US11023803B2

公开(公告)日：2021-06-01

申请号：US15482925

申请日：2017-04-10

Applicant: Intel Corporation

Inventor： Dhiraj D. Kalamkar , Karthikeyan Vaidyanathan , Srinivas Sridharan , Dipankar Das

IPC: G06N3/04 , G06N3/063 , G06T1/20 , G06N3/08

Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.

25.

发明申请
PARALLEL PROCESSING BASED ON INJECTION NODE BANDWIDTH 有权

公开(公告)号：US20210109888A1

公开(公告)日：2021-04-15

申请号：US16642483

申请日：2017-09-30

Applicant: Intel Corporation

Inventor： Karthikeyan Vaidyanathan , Srinivas Sridharan , Dipankar Das

IPC: G06F15/163

Abstract: A technique includes performing a collective operation among multiple nodes of a parallel processing computer system using multiple parallel processing stages. The technique includes regulating an ordering of the parallel processing stages so that an initial stage of the plurality of parallel processing stages is associated with a higher node injection bandwidth than a subsequent stage of the plurality of parallel processing stages.

26.

发明授权
Optimized compute hardware for machine learning operations 有权

公开(公告)号：US10776699B2

公开(公告)日：2020-09-15

申请号：US15869564

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06F17/16 , G06F9/30 , G06N3/08 , G06N3/063 , G06N3/04

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.

27.

发明授权
Instructions for fused multiply-add operations with variable precision input operands 有权

公开(公告)号：US12288062B2

公开(公告)日：2025-04-29

申请号：US18399578

申请日：2023-12-28

Applicant: Intel Corporation

Inventor： Dipankar Das , Naveen K. Mellempudi , Mrinmay Dutta , Arun Kumar , Dheevatsa Mudigere , Abhisek Kundu

IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/063

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

28.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US12106210B2

公开(公告)日：2024-10-01

申请号：US18456272

申请日：2023-08-25

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dipankar Das

IPC: G06N3/044 , G06F5/01 , G06F7/487 , G06F7/544 , G06N3/045 , G06N3/063 , G06N3/084 , G06T1/20

CPC classification number: G06N3/063 , G06F5/012 , G06F7/487 , G06F7/5443 , G06N3/044 , G06N3/045 , G06N3/084 , G06T1/20

Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.

29.

发明授权
Dynamic precision management for integer deep learning primitives 有权

公开(公告)号：US12033237B2

公开(公告)日：2024-07-09

申请号：US18306033

申请日：2023-04-24

Applicant: Intel Corporation

Inventor： Naveen K. Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan

IPC: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/544 , G06F17/15 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084

CPC classification number: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/5443 , G06F17/153 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F2207/382 , G06F2207/4824

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to convert elements of a floating-point tensor to convert the floating-point tensor into a fixed-point tensor.

30.

发明授权
Utilizing structured sparsity in systolic arrays 有权

公开(公告)号：US11977885B2

公开(公告)日：2024-05-07

申请号：US17107823

申请日：2020-11-30

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi

IPC: G06F9/30 , G06F9/38 , G06F15/80

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30101 , G06F9/3893 , G06F15/8046

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification