-
11.
公开(公告)号:US11048514B2
公开(公告)日:2021-06-29
申请号:US16898189
申请日:2020-06-10
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich , Orr Goldman
Abstract: Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes an entry point detector to detect a first entry point address and a second entry point address of an original GPU kernel, the first entry point address including a first entry point instruction, the second entry point address including a second entry point instruction. An instruction inserter is to create a corresponding instrumented GPU kernel from the original GPU kernel by inserting first profiling initialization instructions at a first address of the instrumented GPU kernel, the instruction inserter to insert profiling measurement instructions into the instrumented GPU kernel. An entry point adjuster is to adjust a list of entry points of the instrumented GPU kernel to replace the first entry point address with the first address and the second entry point address with the second address.
-
公开(公告)号:US10949330B2
公开(公告)日:2021-03-16
申请号:US16296357
申请日:2019-03-08
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich
Abstract: An embodiment of a semiconductor package apparatus may include technology to determine a size for a trace buffer based on instrumented code to be executed on a graphics processor, initialize the trace buffer in a shared memory based on the determined size, provide the instrumented code to the graphics processor to be executed, collect data in the trace buffer from the executed instrumented code, analyze the data collected in the trace buffer on a processor, and generate a trace of the instrumented code on the processor based on the analyzed data. Other embodiments are disclosed and claimed.
-
13.
公开(公告)号:US10705846B2
公开(公告)日:2020-07-07
申请号:US15998681
申请日:2018-08-15
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich , Orr Goldman
Abstract: Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes an entry point detector to detect a first entry point address and a second entry point address of an original GPU kernel. An instruction inserter is to create a corresponding instrumented GPU kernel from the original GPU kernel by adding instructions of the original GPU kernel and one or more profiling instructions to the instrumented GPU kernel. The instruction inserter is to insert, at the first entry point address of the instrumented GPU kernel, a first jump instruction to jump to first profiling initialization instructions, the instruction inserter to insert, at the second entry point address of the instrumented GPU kernel, a second jump instruction to jump to second profiling initialization instructions. The instruction inserter is to insert profiling measurement instructions of the profiling instructions into the instrumented GPU kernel.
-
公开(公告)号:US10467118B2
公开(公告)日:2019-11-05
申请号:US15718435
申请日:2017-09-28
Applicant: INTEL CORPORATION
Inventor: Konstantin Levit-Gurevich , Michael Berezalsky , Noam Itzhaki , Arik Narkis
Abstract: Techniques and apparatus for performance analysis of a program are described. In one embodiment, for example, an apparatus may include at least one memory, and logic, at least a portion of comprised in hardware coupled to the at least one memory, to access a program for performance analysis, the program comprising at least one producer instruction and at least one consumer instruction for the at least one producer instruction, and generate an analysis program based on the program, the analysis program comprising a stall time instruction set to determine a stall time of the at least one producer instruction, the stall time instruction set comprising a first time stamp instruction immediately preceding a consumer instruction, a second time stamp instruction immediately following the consumer instruction, and a stall time instruction to determine the stall time as the difference between the second time stamp and the first time stamp. Other embodiments are described and claimed.
-
公开(公告)号:US20190205243A1
公开(公告)日:2019-07-04
申请号:US16296357
申请日:2019-03-08
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich
CPC classification number: G06F11/3636 , G06F8/427 , G06F9/52 , G06T1/20
Abstract: An embodiment of a semiconductor package apparatus may include technology to determine a size for a trace buffer based on instrumented code to be executed on a graphics processor, initialize the trace buffer in a shared memory based on the determined size, provide the instrumented code to the graphics processor to be executed, collect data in the trace buffer from the executed instrumented code, analyze the data collected in the trace buffer on a processor, and generate a trace of the instrumented code on the processor based on the analyzed data. Other embodiments are disclosed and claimed.
-
16.
公开(公告)号:US20180173291A1
公开(公告)日:2018-06-21
申请号:US15385184
申请日:2016-12-20
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich , Gadi Haber
CPC classification number: G06F9/3867 , G06F1/3206 , G06F9/30065
Abstract: Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.
-
17.
公开(公告)号:US12147809B2
公开(公告)日:2024-11-19
申请号:US18463142
申请日:2023-09-07
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich , Orr Goldman
Abstract: Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes instructions, and at least one processor to execute the instructions to determine whether a GPU supports modification of entry point addresses, detect a first entry point address and a second entry point address of an original GPU kernel, create a corresponding instrumented GPU kernel from the original GPU kernel based on the determination by inserting at least one of first profiling initialization instructions or first jump instructions at the first entry point address of the instrumented GPU kernel, inserting at least one of second profiling initialization instructions or second jump instructions at the second entry point address of the instrumented GPU kernel, and inserting profiling measurement instructions into the instrumented GPU kernel.
-
公开(公告)号:US11694299B2
公开(公告)日:2023-07-04
申请号:US17484942
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich , Michael Berezalsky , Noam Itzhaki , Arik Narkis , Orr Goldman
CPC classification number: G06T1/20 , G06F9/3877 , G06F9/455 , G06F9/5055 , G06T1/60
Abstract: Embodiments are disclosed for emulation of graphics processing unit instructions. An example method executing an instrumented kernel using a logic circuit, the instrumented kernel including an emulation sequence; saving, in response to determination that the emulation sequence is to be executed, source data to a shared memory; setting an emulation request flag to indicate to processor circuitry separate from the logic circuit that offloaded execution of the emulation sequence is to be executed; monitoring the emulation request flag to determine whether the offloaded execution of the emulation sequence is complete; and accessing resulting data from the shared memory.
-
公开(公告)号:US11461954B2
公开(公告)日:2022-10-04
申请号:US17223464
申请日:2021-04-06
Applicant: Intel Corporation
Inventor: Michael Apodaca , John Feit , David Cimini , Thomas Raoux , Konstantin Levit-Gurevich
IPC: G06T15/00
Abstract: An apparatus to facilitate an update of shader data constants. The apparatus includes one or more processors to detect a change to one or more data constants in a shader program, generate a micro-code block including updated constants data during execution of the shader program and transmit the micro-code block to the shader program.
-
公开(公告)号:US20220100512A1
公开(公告)日:2022-03-31
申请号:US17547765
申请日:2021-12-10
Applicant: Intel Corporation
Inventor: Konstantin Levit-Gurevich , Alexander Skaletsky
Abstract: A deterministic replay of a multi-threaded trace on a multi-threaded processor is described. An example of a computer-readable storage medium includes instructions to cause at least one processor to receive graphics processing unit (GPU) program code for tracing, the program code including a plurality of instructions; analyze the plurality of instructions to identify instructions of the program code that are events requiring synchronization; instrument each of the identified events to generate instrumented program code; execute the instrumented program code on a plurality of hardware threads of the GPU to generate trace data; and emulate the trace data utilizing an emulator on a plurality of hardware traces of a central processing unit (CPU), including replaying the identified events according to an order of occurrence of the identified events.
-
-
-
-
-
-
-
-
-