Patent search ap:("Intel Corporation") AND inv:"Dipankar Das" Page 5

41.

发明授权
Data parallelism and halo exchange for distributed machine learning 有权

公开(公告)号：US12211117B2

公开(公告)日：2025-01-28

申请号：US17849968

申请日：2022-06-27

Applicant: Intel Corporation

Inventor： Dipankar Das , Karthikeyan Vaidyanathan , Srinivas Sridharan

IPC: G06T1/20 , G06N3/045 , G06N3/063 , G06N3/084 , G06T1/60 , G06N3/044

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.

42.

发明授权
Communication optimizations for distributed machine learning 有权

公开(公告)号：US11704565B2

公开(公告)日：2023-07-18

申请号：US17685462

申请日：2022-03-03

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das , Chandrasekaran Sakthivel , Mikhail E. Smorkalov

IPC: G06N3/08 , G06N3/088 , G06F9/50 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/04 , G06N3/063 , G06N3/048 , G06N7/01

CPC classification number: G06N3/08 , G06F9/50 , G06F9/5061 , G06F9/5077 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N3/088 , G06N3/048 , G06N7/01

Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library. The instructions cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network. The processor can transparently adjust communication paths between worker nodes based on the communication pattern.

43.

发明授权
Apparatuses, methods, and systems for access synchronization in a shared memory 有权

公开(公告)号：US11681529B2

公开(公告)日：2023-06-20

申请号：US17410934

申请日：2021-08-24

Applicant: Intel Corporation

Inventor： Swagath Venkataramani , Dipankar Das , Sasikanth Avancha , Ashish Ranjan , Subarno Banerjee , Bharat Kaul , Anand Raghunathan

IPC: G06F12/14 , G06F9/30 , G06N3/04 , G06N3/084 , G06N3/063 , G06F9/52 , G06F9/38

CPC classification number: G06F9/30145 , G06F9/3004 , G06F9/30043 , G06F9/30087 , G06F9/3834 , G06F9/52 , G06N3/04 , G06N3/063 , G06N3/084

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

44.

发明公开
HARDWARE IMPLEMENTED POINT TO POINT COMMUNICATION PRIMITIVES FOR MACHINE LEARNING 审中-公开

公开(公告)号：US20230177328A1

公开(公告)日：2023-06-08

申请号：US17972832

申请日：2022-10-25

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das

IPC: G06N3/08 , G06N3/04 , G06F9/54 , G06N3/063 , G06N3/084 , G06N3/045

CPC classification number: G06N3/08 , G06N3/04 , G06F9/547 , G06N3/063 , G06N3/084 , G06N3/045

Abstract: One embodiment provides for a graphics processing unit including a fabric interface configured to transmit gradient data stored in a memory device of the graphics processing unit according to a pre-defined communication operation. The memory device is a physical memory device shared with a compute block of the graphics processing unit and the fabric interface. The fabric interface automatically transmits the gradient data stored in memory to a second distributed training node based on an address of the gradient data in the memory device.

45.

发明授权
Conversion hardware mechanism 有权

公开(公告)号：US11494163B2

公开(公告)日：2022-11-08

申请号：US16562979

申请日：2019-09-06

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dipankar Das , Chunhui Mei , Kristopher Wong , Dhiraj D. Kalamkar , Hong H. Jiang , Subramaniam Maiyuran , Varghese George

IPC: G06F7/499 , G06F17/16 , G06T1/20 , G06N3/04 , G06N3/08

Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.

46.

发明授权
Hardware implemented point to point communication primitives for machine learning 有权

公开(公告)号：US11488008B2

公开(公告)日：2022-11-01

申请号：US15869510

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Karthikeyan Vaidyanathan , Dipankar Das

IPC: G06N3/08 , G06N3/04 , G06F9/54 , G06N3/063

Abstract: One embodiment provides for a system to compute and distribute data for distributed training of a neural network, the system including first memory to store a first set of instructions including a machine learning framework; a fabric interface to enable transmission and receipt of data associated with the set of trainable machine learning parameters; a first set of general-purpose processor cores to execute the first set of instructions, the first set of instructions to provide a training workflow for computation of gradients for the trainable machine learning parameters and to communicate with a second set of instructions, the second set of instructions facilitate transmission and receipt of the gradients via the fabric interface; and a graphics processor to perform compute operations associated with the training workflow to generate the gradients for the trainable machine learning parameters.

47.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 有权

公开(公告)号：US20220343174A1

公开(公告)日：2022-10-27

申请号：US17742581

申请日：2022-05-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F17/16 , G06F9/30 , G06F9/38 , G06F7/544

Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.

48.

发明授权
Data parallelism and halo exchange for distributed machine learning 有权

公开(公告)号：US11373266B2

公开(公告)日：2022-06-28

申请号：US15869551

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Karthikeyan Vaidyanathan , Srinivas Sridharan

IPC: G06T1/20 , G06T1/60 , G06N3/08 , G06N3/063 , G06N3/04

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.

49.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11314515B2

公开(公告)日：2022-04-26

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

50.

发明授权
Apparatuses, methods, and systems for access synchronization in a shared memory 有权

公开(公告)号：US11106464B2

公开(公告)日：2021-08-31

申请号：US16317501

申请日：2016-09-27

Applicant: Intel Corporation

Inventor： Swagath Venkataramani , Dipankar Das , Sasikanth Avancha , Ashish Ranjan , Subarno Banerjee , Bharat Kaul , Anand Raghunathan

IPC: G06F9/30 , G06N3/04 , G06N3/08 , G06N3/063 , G06F9/52 , G06F9/38

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification