-
公开(公告)号:US11080813B2
公开(公告)日:2021-08-03
申请号:US16584076
申请日:2019-09-26
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman
IPC: G06T1/20 , G06F3/14 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G09G5/36 , G06T15/04
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.
-
公开(公告)号:US11074109B2
公开(公告)日:2021-07-27
申请号:US16367056
申请日:2019-03-27
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Rahul A. Kulkarni , Abhishek R. Appu , Jeffery S. Boles , Hema C. Nalluri
Abstract: Examples are described here that can be used to allocate commands from multiple sources to performance by one or more segments of a processing device. For example, a processing device can be segmented into multiple portions and each portion is allocated to process commands from a particular source. In the event a single source provides commands, the entire processing device (all segments) can be allocated to process commands from the single source. When a second source provides commands, some segments can be allocated to perform commands from the first source and other segments can be allocated to perform commands from the second source. Accordingly, commands from multiple applications can be executed by a processing unit at the same time.
-
公开(公告)号:US11049284B2
公开(公告)日:2021-06-29
申请号:US16511757
申请日:2019-07-15
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , Prasoonkumar Surti , Kamal Sinha , Nadathur Rajagoplan Satish , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Farshad Akhbari
Abstract: An apparatus to facilitate compute compression is disclosed. The apparatus includes a graphics processing unit including mapping logic to map a first block of integer pixel data to a compression block and compression logic to compress the compression block.
-
公开(公告)号:US11030713B2
公开(公告)日:2021-06-08
申请号:US15483829
申请日:2017-04-10
Applicant: Intel Corporation
Inventor: Karthik Vaidyanathan , Prasoonkumar Surti , Michael Apodaca , Murali Ramadoss , Abhishek Venkatesh , Joydeep Ray , Abhishek R. Appu
IPC: G06T1/60 , G06T9/00 , H04N19/503 , H04N19/436
Abstract: An embodiment of a graphics apparatus may include an embedded local memory, and a memory extender communicatively coupled to the embedded local memory to extend the embedded local memory. The memory extender may be configured to compress information and store the compressed information in the embedded local memory. Additionally, or alternatively, the memory extender may be configured to expose the embedded local memory for non-local access. Other embodiments are disclosed and claimed.
-
公开(公告)号:US20210150663A1
公开(公告)日:2021-05-20
申请号:US17095590
申请日:2020-11-11
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang
IPC: G06T1/60 , G06T1/20 , G06N5/04 , G06F12/0802
Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
-
公开(公告)号:US11010659B2
公开(公告)日:2021-05-18
申请号:US15495020
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar
IPC: G06F17/50 , G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06T1/60
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10997086B1
公开(公告)日:2021-05-04
申请号:US16807430
申请日:2020-03-03
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Aditya Navale , Ankur Shah , Murali Ramadoss , Ben Ashbaugh , Ronald Silvas
IPC: G06F12/10 , G06F12/1072
Abstract: Systems and methods for providing shared virtual memory addressing support for a host system are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations. A memory management unit (MMU) is coupled to the processing resources. The MMU to support a first virtual address size for managing allocation of non-shared virtual memory and to support a second virtual address size for managing allocation of shared virtual memory that is shared between the graphics processor and a host.
-
公开(公告)号:US20210125581A1
公开(公告)日:2021-04-29
申请号:US17062871
申请日:2020-10-05
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , James A. Valerio , Prasoonkumar Surti , Abhishek R. Appu , Vasanth Ranganathan , Kalyan K. Bhiravabhatla , Arthur D. Hunter, JR. , Wei-Yu Chen , Subramaniam M. Maiyuran
IPC: G09G5/36 , G06F12/0875 , G06F9/46 , G09G5/00
Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.
-
公开(公告)号:US10956330B2
公开(公告)日:2021-03-23
申请号:US16727127
申请日:2019-12-26
Applicant: Intel Corporation
Inventor: Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker
IPC: G06F12/0837 , G06N3/08 , G06N20/00 , G06T1/20 , G06F12/0815 , G06N3/063 , G06N3/04
Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20210065329A1
公开(公告)日:2021-03-04
申请号:US17036950
申请日:2020-09-29
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Abhishek R. Appu , Balaji Vembu
Abstract: A mechanism is described for facilitating dynamic merging of atomic operations in computing devices. A method of embodiments, as described herein, includes facilitating detecting atomic messages and a plurality of slot addresses. The method further includes comparing one or more slot addresses of the plurality of slot addresses with other slot addresses of the plurality of slot addresses to seek one or more matched slot addresses, where the one or more matched slot addresses are merged into one or more merged groups. The method may further include generating one or more merged atomic operations based on and corresponding to the one or more merged groups.
-
-
-
-
-
-
-
-
-