-
公开(公告)号:US12117962B2
公开(公告)日:2024-10-15
申请号:US18450685
申请日:2023-08-16
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
CPC classification number: G06F15/8069 , G06F9/30163 , G06F9/3877 , G06T15/005 , G06F9/3836
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11892950B2
公开(公告)日:2024-02-06
申请号:US17865666
申请日:2022-07-15
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/084 , G06F12/0862 , G06T1/20 , G06T1/60
CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US11281496B2
公开(公告)日:2022-03-22
申请号:US16355130
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Ben Ashbaugh , Jonathan Pearce , Murali Ramadoss , Vikranth Vemulapalli , William B. Sadler , Sungye Kim , Marian Alin Petre
IPC: G06F9/46 , G06F9/50 , G06F9/38 , G06F9/54 , G06F12/0837 , G06F9/48 , G06F9/345 , G06T1/60 , G06F9/30 , G06T15/00 , G06F16/245 , G06T1/20
Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
-
4.
公开(公告)号:US12124533B2
公开(公告)日:2024-10-22
申请号:US17482875
申请日:2021-09-23
Applicant: Intel Corporation
Inventor: Anbang Yao , Ming Lu , Yikai Wang , Scott Janus , Sungye Kim
IPC: G06F18/2136 , G06T11/00
CPC classification number: G06F18/2136 , G06T11/00 , G06T2207/20076 , G06T2207/20081
Abstract: Embodiments are generally directed to methods and apparatuses of spatially sparse convolution module for visual rendering and synthesis. An embodiment of a method for image processing, comprising: receiving an input image by a convolution layer of a neural network to generate a plurality of feature maps; performing spatially sparse convolution on the plurality of feature maps to generate spatially sparse feature maps; and upsampling the spatially sparse feature maps to generate an output image.
-
公开(公告)号:US20240256456A1
公开(公告)日:2024-08-01
申请号:US18391346
申请日:2023-12-20
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/0862 , G06T1/20 , G06T1/60
CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the Li cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US20220261289A1
公开(公告)日:2022-08-18
申请号:US17686089
申请日:2022-03-03
Applicant: Intel Corporation
Inventor: Ben Ashbaugh , Jonathan Pearce , Murali Ramadoss , Vikranth Vemulapalli , William B. Sadler , Sungye Kim , Marian Alin Petre
Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
-
公开(公告)号:US20210255957A1
公开(公告)日:2021-08-19
申请号:US17161465
申请日:2021-01-28
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, JR. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/0862 , G06T1/20 , G06T1/60
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US20240257316A1
公开(公告)日:2024-08-01
申请号:US18615050
申请日:2024-03-25
Applicant: Intel Corporation
Inventor: Anbang Yao , Ming Lu , Yikai Wang , Shandong Wang , Yurong Chen , Sungye Kim , Attila Tamas Afra
CPC classification number: G06T5/50 , G06N3/02 , G06T7/13 , G06V40/161 , G06V40/171 , G06T2207/20084 , G06T2207/30201
Abstract: The present disclosure provides an apparatus and method of guided neural network model for image processing. An apparatus may comprise a guidance map generator, a synthesis network and an accelerator. The guidance map generator may receive a first image as a content image and a second image as a style image, and generate a first plurality of guidance maps and a second plurality of guidance maps, respectively from the first image and the second image. The synthesis network may synthesize the first plurality of guidance maps and the second plurality of guidance maps to determine guidance information. The accelerator may generate an output image by applying the style of the second image to the first image based on the guidance information.
-
公开(公告)号:US11972545B2
公开(公告)日:2024-04-30
申请号:US17482998
申请日:2021-09-23
Applicant: Intel Corporation
Inventor: Anbang Yao , Ming Lu , Yikai Wang , Shandong Wang , Yurong Chen , Sungye Kim , Attila Tamas Afra
CPC classification number: G06T5/50 , G06N3/02 , G06T7/13 , G06V40/161 , G06V40/171 , G06T2207/20084 , G06T2207/30201
Abstract: The present disclosure provides an apparatus and method of guided neural network model for image processing. An apparatus may comprise a guidance map generator, a synthesis network and an accelerator. The guidance map generator may receive a first image as a content image and a second image as a style image, and generate a first plurality of guidance maps and a second plurality of guidance maps, respectively from the first image and the second image. The synthesis network may synthesize the first plurality of guidance maps and the second plurality of guidance maps to determine guidance information. The accelerator may generate an output image by applying the style of the second image to the first image based on the guidance information.
-
公开(公告)号:US11954062B2
公开(公告)日:2024-04-09
申请号:US17310540
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: Joydeep Ray , Niranjan Cooray , Subramaniam Maiyuran , Altug Koker , Prasoonkumar Surti , Varghese George , Valentin Andrei , Abhishek Appu , Guadalupe Garcia , Pattabhiraman K , Sungye Kim , Sanjay Kumar , Pratik Marolia , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , William Sadler , Lakshminarayanan Striramassarma
IPC: G06F12/00 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
-
-
-
-
-
-
-
-
-