-
公开(公告)号:US20230051227A1
公开(公告)日:2023-02-16
申请号:US17978426
申请日:2022-11-01
Applicant: Intel Corporation
Inventor: Michal Mrozek , Bartosz Dunajski , Ben Ashbaugh , Brandon Fliflet
Abstract: An apparatus to facilitate processing in a multi-tile device is disclosed. In one embodiment, the apparatus includes a graphics processor comprising a first semiconductor die including a first high-bandwidth memory (HBM) device, a second semiconductor die including a second HBM device, and a third semiconductor die coupled with the first semiconductor die and the second semiconductor die in a 2.5-dimensional (2.5D) arrangement. The third semiconductor die includes a graphics processing resource and a cache coupled with the graphics processing resource. The cache is configurable to cache data associated with memory accessed by the graphics processing resource and the graphics processing resource includes a general-purpose graphics processor core and a tensor core.
-
公开(公告)号:US12199759B2
公开(公告)日:2025-01-14
申请号:US17688028
申请日:2022-03-07
Applicant: Intel Corporation
Inventor: Michal Mrozek , Vinod Tipparaju
Abstract: Described herein is a graphics processor configured to perform asynchronous input dependency resolution among a group of interdependent workloads. The graphics processor can dynamically resolve input dependencies among the workloads according to a dependency relationship defined for the workloads. Dependency resolution be performed via a deferred submission mode which resolves input dependencies prior to thread dispatch to the processing resources or via immediate submission mode which resolves input dependencies at the processing resources.
-
公开(公告)号:US10235732B2
公开(公告)日:2019-03-19
申请号:US14142681
申请日:2013-12-27
Applicant: Intel Corporation
Inventor: Jayanth N. Rao , Michal Mrozek
Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.
-
公开(公告)号:US12217327B2
公开(公告)日:2025-02-04
申请号:US17978426
申请日:2022-11-01
Applicant: Intel Corporation
Inventor: Michal Mrozek , Bartosz Dunajski , Ben Ashbaugh , Brandon Fliflet
Abstract: An apparatus to facilitate processing in a multi-tile device is disclosed. In one embodiment, the apparatus includes a graphics processor comprising a first semiconductor die including a first high-bandwidth memory (HBM) device, a second semiconductor die including a second HBM device, and a third semiconductor die coupled with the first semiconductor die and the second semiconductor die in a 2.5-dimensional (2.5D) arrangement. The third semiconductor die includes a graphics processing resource and a cache coupled with the graphics processing resource. The cache is configurable to cache data associated with memory accessed by the graphics processing resource and the graphics processing resource includes a general-purpose graphics processor core and a tensor core.
-
5.
公开(公告)号:US10521874B2
公开(公告)日:2019-12-31
申请号:US14498220
申请日:2014-09-26
Applicant: INTEL CORPORATION
Inventor: Jayanth N. Rao , Pavan K. Lanka , Michal Mrozek
Abstract: An apparatus and method are described for executing workloads without host intervention. For example, one embodiment of an apparatus comprises: a host processor; and a graphics processor unit (GPU) to execute a hierarchical workload responsive to one or more commands issued by the host processor, the hierarchical workload comprising a parent workload and a plurality of child workloads interconnected in a logical graph structure; and a scheduler kernel implemented by the GPU to schedule execution of the plurality of child workloads without host intervention, the scheduler kernel to evaluate conditions required for execution of the child workloads and determine an order in which to execute the child workloads on the GPU based on the evaluated conditions; the GPU to execute the child workloads in the order determined by the scheduler kernel and to provide results of parent and child workloads to the host processor following execution of all of the child workloads.
-
公开(公告)号:US20190259129A1
公开(公告)日:2019-08-22
申请号:US16260031
申请日:2019-01-28
Applicant: Intel Corporation
Inventor: Jayanth N. Rao , Michal Mrozek
Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.
-
公开(公告)号:US20220291955A1
公开(公告)日:2022-09-15
申请号:US17688028
申请日:2022-03-07
Applicant: Intel Corporation
Inventor: Michal Mrozek , Vinod Tipparaju
Abstract: Described herein is a graphics processor configured to perform asynchronous input dependency resolution among a group of interdependent workloads. The graphics processor can dynamically resolve input dependencies among the workloads according to a dependency relationship defined for the workloads. Dependency resolution be performed via a deferred submission mode which resolves input dependencies prior to thread dispatch to the processing resources or via immediate submission mode which resolves input dependencies at the processing resources.
-
公开(公告)号:US10937118B2
公开(公告)日:2021-03-02
申请号:US16260031
申请日:2019-01-28
Applicant: Intel Corporation
Inventor: Jayanth N. Rao , Michal Mrozek
Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.
-
公开(公告)号:US11907756B2
公开(公告)日:2024-02-20
申请号:US16796836
申请日:2020-02-20
Applicant: Intel Corporation
Inventor: Bartosz Dunajski , Brandon Fliflet , Michal Mrozek
CPC classification number: G06F9/4881 , G06F9/30003 , G06F9/3009 , G06F9/4482 , G06F9/485 , G06F9/544 , G06F9/546 , G06T1/20 , G06F9/505 , G06F9/5044 , G06T1/60
Abstract: A graphics processing apparatus that includes at least a memory device and an execution unit coupled to the memory. The memory device can store a command buffer with at least one command that is dependent on completion of at least one other command. The command buffer can include a jump command that causes a jump to a location in the command buffer to identify any unscheduled command. The execution unit is to jump to a location in the command buffer based on execution of the jump command. The execution unit is to perform one or more jumps to one or more locations in the command buffer to attempt to schedule a command with dependency on completion of at least one other command until the command with a dependency on completion of at least one other command is scheduled.
-
公开(公告)号:US20240054595A1
公开(公告)日:2024-02-15
申请号:US17884755
申请日:2022-08-10
Applicant: Intel Corporation
Inventor: Joydeep Ray , Vasanth Ranganathan , James Valerio , Jeffery S. Boles , Hema Chand Nalluri , Aditya Navale , Ben J. Ashbaugh , Michal Mrozek , Murali Ramadoss , Hong Jiang , Ankur Shah
CPC classification number: G06T1/20 , G06T1/60 , G06F9/3855
Abstract: Embodiments described herein provide a system of concurrent compute queues that enable the scheduling of a large number of compute contexts simultaneously on graphics processor hardware. One embodiment provides an apparatus comprising a system interface and a general-purpose graphics processor coupled with the system interface. The general-purpose graphics processor comprises a plurality of graphics processor hardware resources configured to be partitioned into a plurality of isolated partitions, each of the plurality of isolated partitions including a first command streamer, a second command streamer, and circuitry configured to schedule general-purpose graphics compute workloads submitted to a first plurality of command queues associated with the first command streamer and a second plurality of command queues associated with the second command streamer.
-
-
-
-
-
-
-
-
-