-
公开(公告)号:US11442795B2
公开(公告)日:2022-09-13
申请号:US16567993
申请日:2019-09-11
Applicant: NVIDIA Corp.
Inventor: Daniel Robert Johnson , Jack Choquette , Oliver Giroux , Michael Patrick McKeown , Mark Stephenson , Sana Damani
Abstract: Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
-
公开(公告)号:US20220027194A1
公开(公告)日:2022-01-27
申请号:US17184420
申请日:2021-02-24
Applicant: NVIDIA Corp.
Inventor: Sana Damani , Mark Stephenson , Ram Rangan , Daniel Robert Johnson , Rishkul Kulkarni
Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.
-
公开(公告)号:US20200081748A1
公开(公告)日:2020-03-12
申请号:US16567993
申请日:2019-09-11
Applicant: NVIDIA Corp.
Inventor: Daniel Robert Johnson , Jack Choquette , Oliver Giroux , Michael Patrick McKeown , Mark Stephenson , Sana Damani
Abstract: Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
-
公开(公告)号:US11847508B2
公开(公告)日:2023-12-19
申请号:US17819243
申请日:2022-08-11
Applicant: NVIDIA Corp.
Inventor: Daniel Robert Johnson , Jack Choquette , Olivier Giroux , Michael Patrick McKeown , Mark Stephenson , Sana Damani
CPC classification number: G06F9/522 , G06F9/3836 , G06F9/3887 , G06F9/4881
Abstract: Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
-
公开(公告)号:US20230144553A1
公开(公告)日:2023-05-11
申请号:US17697325
申请日:2022-03-17
Applicant: NVIDIA Corp.
Inventor: Sana Damani , Sean Treichler , Mark Stephenson
CPC classification number: G06F9/30098 , G06F9/321 , G06F9/3009 , G06F9/4881 , G06F9/30065
Abstract: A computing system including one or more processor and one or more memory that stores application code that configures the processor to execute an application. The system includes logic to identify high and low register utilization regions of the application code and insert register acquire instructions and register release instructions in the application code by the compiler, such that when executed by the processor, the application code borrows and returns registers to an inter-block register pool when execution enters a high and low register utilization region, respectively.
-
公开(公告)号:US20230115044A1
公开(公告)日:2023-04-13
申请号:US17568514
申请日:2022-01-04
Applicant: NVIDIA Corp.
Inventor: Sana Damani , Sean Treichler , Mark Stephenson , Daniel Robert Johnson
Abstract: Instruction set architecture extensions to configure priority ordering of divergent target branch instructions on SIMT computing platforms to enable tools such as compilers (e.g., under influence of execution profilers) or human software developers to configure branch direction prioritization explicitly in code. Extensions for simple (two-way) branch instructions as well as multi-target (more than two branch target instructions) are disclosed.
-
公开(公告)号:US11934867B2
公开(公告)日:2024-03-19
申请号:US17184420
申请日:2021-02-24
Applicant: NVIDIA Corp.
Inventor: Sana Damani , Mark Stephenson , Ram Rangan , Daniel Robert Johnson , Rishkul Kulkarni
CPC classification number: G06F9/4881 , G06F9/3009 , G06F9/522
Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.
-
公开(公告)号:US20230038061A1
公开(公告)日:2023-02-09
申请号:US17819243
申请日:2022-08-11
Applicant: NVIDIA Corp.
Inventor: Daniel Robert Johnson , Jack Choquette , Olivier Giroux , Michael Patrick McKeown , Mark Stephenson , Sana Damani
Abstract: Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.
-
-
-
-
-
-
-