-
公开(公告)号:US11494163B2
公开(公告)日:2022-11-08
申请号:US16562979
申请日:2019-09-06
Applicant: Intel Corporation
Inventor: Naveen Mellempudi , Dipankar Das , Chunhui Mei , Kristopher Wong , Dhiraj D. Kalamkar , Hong H. Jiang , Subramaniam Maiyuran , Varghese George
Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.
-
公开(公告)号:US20220180468A1
公开(公告)日:2022-06-09
申请号:US17674781
申请日:2022-02-17
Applicant: Intel Corporation
Inventor: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US11314515B2
公开(公告)日:2022-04-26
申请号:US16724831
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George
Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
-
公开(公告)号:US11232533B2
公开(公告)日:2022-01-25
申请号:US16355274
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Valentin Andrei , Abhishek R. Appu , Nicolas Galoppo von Borries , Varghese George , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Mike Macpherson , Subramaniam Maiyuran
Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.
-
公开(公告)号:US11151769B2
公开(公告)日:2021-10-19
申请号:US16537140
申请日:2019-08-09
Applicant: Intel Corporation
Inventor: Hugues Labbe , Darrel Palke , Sherine Abdelhak , Jill Boyce , Varghese George , Scott Janus , Adam Lake , Zhijun Lei , Zhengmin Li , Mike Macpherson , Carl Marshall , Selvakumar Panneer , Prasoonkumar Surti , Karthik Veeramani , Deepak Vembar , Vallabhajosyula Srinivasa Somayazulu
Abstract: One embodiment provides for a graphics processor comprising a block of graphics compute units, a graphics processor pipeline coupled to the block of graphics compute units, and a programmable neural network unit including one or more neural network hardware blocks. The programmable neural network unit is coupled with the block of graphics compute units and the graphics processor pipeline. The one or more neural network hardware blocks include hardware to perform neural network operations and activation operations for a layer of a neural network. The programmable neural network unit can configure settings of one or more hardware blocks within the graphics processor pipeline based on a machine learning model trained to optimize performance of a set of workloads.
-
公开(公告)号:US11042370B2
公开(公告)日:2021-06-22
申请号:US15957728
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. Macpherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
公开(公告)号:US20210133913A1
公开(公告)日:2021-05-06
申请号:US17069188
申请日:2020-10-13
Applicant: Intel Corporation
Inventor: Naveen Matam , Lance Cheney , Eric Finley , Varghese George , Sanjeev Jahagirdar , Altug Koker , Josh Mastronarde , Iqbal Rajwani , Lakshminarayanan Striramassarma , Melaku Teshome , Vikranth Vemulapalli , Binoj Xavier
Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
-
公开(公告)号:US20210090327A1
公开(公告)日:2021-03-25
申请号:US17112792
申请日:2020-12-04
Applicant: Intel Corporation
Inventor: Jill Boyce , Soethiha Soe , Selvakamur Panneer , Adam Lake , Nilesh Jain , Deepak Vembar , Glen J. Anderson , Varghese George , Carl Marshall , Scott Janus , Saurabh Tangri , Karthik Veeramani , Prasoonkumar Surti
Abstract: Embodiments are directed to neural network processing for multi-object three-dimensional (3D) modeling. An embodiment of a computer-readable storage medium includes executable computer program instructions for obtaining data from multiple cameras, the data including multiple images, and generating a 3D model for 3D imaging based at least in part on the data from the cameras, wherein generating the 3D model includes one or more of performing processing with a first neural network to determine temporal direction based at least in part on motion of one or more objects identified in an image of the multiple images or performing processing with a second neural network to determine semantic content information for an image of the multiple images.
-
公开(公告)号:US20210081201A1
公开(公告)日:2021-03-18
申请号:US17107823
申请日:2020-11-30
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi
Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
-
公开(公告)号:US20200294182A1
公开(公告)日:2020-09-17
申请号:US16355573
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Varghese George , Altug Koker , Aravindh Anantaraman , Subramaniam Maiyuran , SungYe Kim , Valentin Andrei , Elmoustapha Ould-Ahmed-Vall , Joydeep Ray , Abhishek R. Appu , Nicolas C. Galoppo von Borries , Prasoonkumar Surti , Mike Macpherson
Abstract: Apparatuses including general-purpose graphics processing units having on chip dense memory for temporal buffering are disclosed. In one embodiment, a graphics multiprocessor includes a plurality of compute engines to perform first computations to generate a first set of data, cache for storing data, and a high density memory that is integrated on chip with the plurality of compute engines and the cache. The high density memory to receive the first set of data, to temporarily store the first set of data, and to provide the first set of data to the cache during a first time period that is prior to a second time period when the plurality of compute engines will use the first set of data for second computations.
-
-
-
-
-
-
-
-
-