Patent search ap:("INTEL CORPORATION") AND inv:"Vasanth Ranganathan" Page 3

21.

发明申请
INTERCONNECTED SYSTEMS FENCE MECHANISM 有权

公开(公告)号：US20220075746A1

公开(公告)日：2022-03-10

申请号：US17014023

申请日：2020-09-08

Applicant: Intel Corporation

Inventor： Hema Chand Nalluri , Ankur Shah , Joydeep Ray , Aditya Navale , Altug Koker , Murali Ramadoss , Niranjan L. Cooray , Jeffery S. Boles , Aravindh Anantaraman , David Puffer , James Valerio , Vasanth Ranganathan

IPC: G06F13/40 , G06F13/16 , G06F9/30 , G06F9/52 , G06F12/0837 , G06F12/0888

Abstract: An apparatus to facilitate memory barriers is disclosed. The apparatus comprises an interconnect, a device memory, a plurality of processing resources, coupled to the device memory, to execute a plurality of execution threads as memory data producers and memory data consumers to a device memory and a system memory and fence hardware to generate fence operations to enforce data ordering on memory operations issued to the device memory and a system memory coupled via the interconnect.

22.

发明授权
Source synchronized signaling mechanism 有权

公开(公告)号：US11237993B2

公开(公告)日：2022-02-01

申请号：US17011745

申请日：2020-09-03

Applicant: Intel Corporation

Inventor： Altug Koker , Joydeep Ray , Vasanth Ranganathan , Abhishek R. Appu

IPC: G06F13/16 , G06T1/20 , G06F13/40 , G06F13/42

Abstract: An apparatus to facilitate source synchronous signaling is disclosed. The apparatus includes transfer protocol logic to provide for source synchronous transfer of data within an interconnect fabric, including one or more synchronizers having logic to a transmit data signal and a source clock (clk) signal during the transfer of data.

23.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20220019431A1

公开(公告)日：2022-01-20

申请号：US17305355

申请日：2021-07-06

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06N3/04 , G06F9/38 , G06F7/544 , G06N3/08 , G06N3/063 , G06F7/483 , G09G5/393 , G06T15/00 , G06F17/16 , G06N20/00

Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

24.

发明申请
SECTOR CACHE FOR COMPRESSION 有权

公开(公告)号：US20210374062A1

公开(公告)日：2021-12-02

申请号：US17400415

申请日：2021-08-12

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Altug Koker , Joydeep Ray , David Puffer , Prasoonkumar Surti , Lakshminarayanan Striramassarma , Vasanth Ranganathan , Kiran C. Veernapu , Balaji Vembu , Pattabhiraman K

IPC: G06F12/0877 , G06F12/0802 , G06F12/0855 , G06F12/0806 , G06F12/0846 , G06F12/0868 , G06T1/60 , G06F12/126

Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.

25.

发明申请
DYNAMIC PRECISION FOR NEURAL NETWORK COMPUTE OPERATIONS 有权

公开(公告)号：US20210334637A1

公开(公告)日：2021-10-28

申请号：US17317857

申请日：2021-05-11

Applicant: INTEL CORPORATION

Inventor： Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar

IPC: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293

Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

26.

发明授权
Instruction prefetch based on thread dispatch commands 有权

公开(公告)号：US11157283B2

公开(公告)日：2021-10-26

申请号：US16243663

申请日：2019-01-09

Applicant: Intel Corporation

Inventor： James Valerio , Vasanth Ranganathan , Joydeep Ray , Pradeep Ramani

IPC: G06F9/38 , G06T1/20 , G06F13/28

Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.

27.

发明申请
BARRIER SYNCHRONIZATION MECHANISM 有权

公开(公告)号：US20210263785A1

公开(公告)日：2021-08-26

申请号：US16798603

申请日：2020-02-24

Applicant: Intel Corporation

Inventor： James Valerio , Vasanth Ranganathan , Joydeep Ray

IPC: G06F9/52 , G06F9/54 , G06F9/30 , G06F9/38 , G06F9/48

Abstract: An apparatus to facilitate thread barrier synchronization is disclosed. The apparatus includes a plurality of processing resources to execute a plurality of execution threads included in a thread workgroup and barrier synchronization hardware to assign a first named barrier to a first set of the plurality of execution threads in the thread workgroup, assign a second named barrier to a second set of the plurality of execution threads in the thread workgroup, synchronize execution of the first set of execution threads via the first named barrier and synchronize execution of the second set of execution threads via the second named barrier.

28.

发明申请
DATA PREFETCHING FOR GRAPHICS DATA PROCESSING 有权

公开(公告)号：US20210255957A1

公开(公告)日：2021-08-19

申请号：US17161465

申请日：2021-01-28

Applicant: Intel Corporation

Inventor： Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, JR. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei

IPC: G06F12/0862 , G06T1/20 , G06T1/60

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

29.

发明授权
Dynamic load balancing of compute assets among different compute contexts 有权

公开(公告)号：US11074109B2

公开(公告)日：2021-07-27

申请号：US16367056

申请日：2019-03-27

Applicant: Intel Corporation

Inventor： James Valerio , Vasanth Ranganathan , Joydeep Ray , Rahul A. Kulkarni , Abhishek R. Appu , Jeffery S. Boles , Hema C. Nalluri

IPC: G06F9/48 , G06T1/20 , G06F9/38 , G06F9/50

Abstract: Examples are described here that can be used to allocate commands from multiple sources to performance by one or more segments of a processing device. For example, a processing device can be segmented into multiple portions and each portion is allocated to process commands from a particular source. In the event a single source provides commands, the entire processing device (all segments) can be allocated to process commands from the single source. When a second source provides commands, some segments can be allocated to perform commands from the first source and other segments can be allocated to perform commands from the second source. Accordingly, commands from multiple applications can be executed by a processing unit at the same time.

30.

发明申请
GRAPHICS PROCESSING UNIT PROCESSING AND CACHING IMPROVEMENTS 有权

公开(公告)号：US20210150663A1

公开(公告)日：2021-05-20

申请号：US17095590

申请日：2020-11-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang

IPC: G06T1/60 , G06T1/20 , G06N5/04 , G06F12/0802

Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification