Patent search ap:("Intel Corporation") AND inv:"Dipankar Das" Page 4

31.

发明授权
Scaling half-precision floating point tensors for training deep neural networks 有权

公开(公告)号：US11823034B2

公开(公告)日：2023-11-21

申请号：US17960947

申请日：2022-10-06

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dipankar Das

IPC: G06N3/063 , G06T1/20 , G06F7/487 , G06F7/544 , G06F5/01 , G06N3/084 , G06N3/044 , G06N3/045

CPC classification number: G06N3/063 , G06F5/012 , G06F7/487 , G06F7/5443 , G06N3/044 , G06N3/045 , G06N3/084 , G06T1/20

Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.

32.

发明授权
Abstraction layers for scalable distributed machine learning 有权

公开(公告)号：US11798120B2

公开(公告)日：2023-10-24

申请号：US17398295

申请日：2021-08-10

Applicant: Intel Corporation

Inventor： Dhiraj D. Kalamkar , Karthikeyan Vaidyanathan , Srinivas Sridharan , Dipankar Das

IPC: G06N3/06 , G06T1/20 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045

CPC classification number: G06T1/20 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

33.

发明授权
Dynamic precision management for integer deep learning primitives 有权

公开(公告)号：US11669933B2

公开(公告)日：2023-06-06

申请号：US17730364

申请日：2022-04-27

Applicant: Intel Corporation

Inventor： Naveen K. Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan

IPC: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/544 , G06F17/15 , G06F17/16 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045

CPC classification number: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/5443 , G06F17/153 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F2207/382 , G06F2207/4824

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor.

34.

发明授权
Incremental precision networks using residual inference and fine-grain quantization 有权

公开(公告)号：US11556772B2

公开(公告)日：2023-01-17

申请号：US15869515

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Abhisek Kundu , Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das

IPC: G06N3/08 , G06N5/04 , G06N3/04 , G06T15/00 , G06F9/46 , G06N3/063 , G06T17/20 , G06T15/80 , G06T17/10 , G06T15/04 , G06V10/94

Abstract: One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.

35.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20220326953A1

公开(公告)日：2022-10-13

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

36.

发明授权
Apparatus and method for vector multiply and accumulate of packed words 有权

公开(公告)号：US11409525B2

公开(公告)日：2022-08-09

申请号：US15879420

申请日：2018-01-24

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Dipankar Das , Robert Valentine , Mark Charney

IPC: G06F9/38 , G06F9/30

Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed words; a second source register to store a second plurality of packed words; a third source register to store a plurality of packed quadwords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed words to generate a first and second plurality of doublewords corresponding to the first and second plurality of packed words; multiplier circuitry to multiply each of the first plurality of doublewords with a corresponding one of the second plurality of doublewords to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed quadword value from a first quadword location in the third source register to generate a first accumulated quadword result; a destination register to store the first accumulated quadword result in the first quadword location.

37.

发明申请
TECHNOLOGIES FOR SCALING DEEP LEARNING TRAINING 有权

公开(公告)号：US20210342692A1

公开(公告)日：2021-11-04

申请号：US17321044

申请日：2021-05-14

Applicant: Intel Corporation

Inventor： Naveen K. Mellempudi , Srinivas Sridharan , Dheevatsa Mudigere , Dipankar Das

IPC: G06N3/08 , G06N3/063 , G06N3/04

Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.

38.

发明授权
Abstraction layers for scalable distributed machine learning 有权

公开(公告)号：US11094029B2

公开(公告)日：2021-08-17

申请号：US15482953

申请日：2017-04-10

Applicant: Intel Corporation

Inventor： Dhiraj D. Kalamkar , Karthikeyan Vaidyanathan , Srinivas Sridharan , Dipankar Das

IPC: G06T1/20 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.

39.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20210191724A1

公开(公告)日：2021-06-24

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

40.

发明授权
Dynamic precision management for integer deep learning primitives 有权

公开(公告)号：US10825127B2

公开(公告)日：2020-11-03

申请号：US16853405

申请日：2020-04-20

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan

IPC: G06T1/20 , G06N3/08 , G06N3/04 , G06F7/544 , G06F17/15 , G06F5/01 , G06F7/523 , G06F17/16 , G06N3/063 , G06F7/501

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification