Patent search ap:("Intel Corporation") AND inv:"Dipankar Das" Page 6

51.

发明授权
Technologies for scaling deep learning training 有权

公开(公告)号：US11068780B2

公开(公告)日：2021-07-20

申请号：US15476998

申请日：2017-04-01

Applicant: Intel Corporation

Inventor： Naveen K. Mellempudi , Srinivas Sridharan , Dheevatsa Mudigere , Dipankar Das

IPC: G06N3/08 , G06N3/06 , G06N3/04 , G06N3/063

Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.

52.

发明申请
UTILIZING STRUCTURED SPARSITY IN SYSTOLIC ARRAYS 有权

公开(公告)号：US20210081201A1

公开(公告)日：2021-03-18

申请号：US17107823

申请日：2020-11-30

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi

IPC: G06F9/30 , G06F9/38 , G06F15/80

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

53.

发明授权
Instructions for fused multiply-add operations with variable precision input operands 有权

公开(公告)号：US10528346B2

公开(公告)日：2020-01-07

申请号：US15940774

申请日：2018-03-29

Applicant: Intel Corporation

Inventor： Dipankar Das , Naveen K. Mellempudi , Mrinmay Dutta , Arun Kumar , Dheevatsa Mudigere , Abhisek Kundu

IPC: G06F9/30 , G06F7/544 , G06F9/38 , G06F7/483 , G06N3/063

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

54.

发明申请
DATA PARALLELISM AND HALO EXCHANGE FOR DISTRIBUTED MACHINE LEARNING 审中-公开

公开(公告)号：US20180322606A1

公开(公告)日：2018-11-08

申请号：US15869551

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , KARTHIKEYAN VAIDYANATHAN , Srinivas Sridharan

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60 , G06T2207/20081 , G06T2207/20084

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.

Patent Agency Ranking