Abstract:
Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings.
Abstract:
A processing device includes a processing core, coupled to a memory, to execute a task including a code segment identified as being monitored and a kernel recorder, coupled to the processing core via a core interface. The kernel recorder includes a first filter circuit to responsive to determining that the task being executed enters the code segment, set the kernel recorder to a first mode under which the kernel recorder is to record, in a first record, a plurality of memory addresses accessed by the code segment, and responsive to determining that the execution of the task exits the code segment, set the kernel recorder to a second mode under which the kernel recorder is to detect a write operation to a memory address recorded in the first record and record the memory address in a second record.
Abstract:
Various embodiments are generally directed an apparatus and method for configuring an execution environment in a user space for device driver operations and redirecting a device driver operation for execution in the execution environment in the user space including copying instructions of the device driver operation from the kernel space to a user process in the user space. In addition, the redirected device driver operation may be executed in the execution environment in the user space.
Abstract:
A page table entry dirty bit system may be utilized to record dirty information for a software distributed shared memory system. In some embodiments, this may improve performance without substantially increasing overhead because the dirty bit recording system is already available in certain processors. By providing extra bits, coherence can be obtained with respect to all the other uses of the existing page table entry dirty bits.
Abstract:
A computer system 100 has several processors. A first processor, such as a central processing unit 110, produces data to be used by a second processor, such as a graphics processor (GPU) 180. When the first processor creates a new version of the data, it determines the difference between the previous version and the new version. It writes a list of the differences to a memory 130 shared by the first and second processors. When the second processor needs to use the new version of the data, it reads the difference lists that have been written by the first processor and applies them to the data to produce the current version. The shared memory may be part of the memory in the second processor, to which the first processor has access. A backup copy of the old version of the data may be kept.
Abstract:
Vorrichtungen, Verfahren und Speichermedium, die mit Mehrphasen/stufen-Starttechnologie verknüpft sind, werden hierin offenbart. Ein Verfahren kann beinhalten, ein Anfangsausführungsbild einer Rechenplattform von persistentem Speicher fortzusetzen, um eine Anfangsaufgabe auszuführen; und nachfolgend ein vollständiges Ausführungsbild der Rechenplattform von dem persistenten Speicher fortzusetzen, um eine von mehreren Betriebsaufgaben auszuführen.