Abstract:
Various embodiments may include methods executed by processors of computing devices for geometry based work execution prioritization. The processor may receive events, such as images. The processor may overlay a boundary shape on the event to identify discard regions of the event lying outside the boundary shape. The processor may identify work regions of the events lying within the working boundary shape. The devices may determine a cancellation likelihood for each of the identified work regions of the events. The processor may assign a trimming weight to each of the identified work regions based on the determined cancellation likelihoods. The processor may then add each of the identified work regions as a work item to an execution work list in an order based on the assigned trimming weights. The work items may be processed in order of trimming weight priority.
Abstract:
Various embodiments include methods for data management in a computing device utilizing a plurality of processing units. Embodiment methods may include generating a data transfer heuristic model based on measurements from a plurality of sample data transfers between a plurality of data storage units. The generated data transfer heuristic model may be used to calculate data transfer costs for each of a plurality of tasks. The calculated data transfer costs may be used to schedule execution of the plurality of tasks in an execution order on selected ones of the plurality of processing units. The data transfer heuristic model may be updated based on measurements of data transfers occurring during the executions of the plurality of tasks (e.g., time, power consumption, etc.). Code executing on the processing units may indicate to a runtime when certain data blocks are no longer needed and thus may be evicted and/or pre-fetched for others.
Abstract:
Methods, devices, and non-transitory processor-readable storage media for a computing device to merge concurrent writes from a plurality of processing units to a buffer associated with an application. An embodiment method executed by a processor may include identifying a plurality of concurrent requests to access the buffer that are sparse, disjoint, and write-only, configuring a write-set for each of the plurality of processing units, executing the plurality of concurrent requests to access the buffer using the write-sets, determining whether each of the plurality of concurrent requests to access the buffer is complete, obtaining a buffer index and data via the write-set of each of the plurality of processing units, and writing to the buffer using the received buffer index and data via the write-set of each of the plurality of processing units in response to determining that each of the plurality of concurrent requests to access the buffer is complete.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by a computing device for task scheduling in the presence of task conflict edges on a computing device. The computing device may determine whether a first task and a second task are related by a task conflict edge. In response to determining that the first task and the second task are related by the task conflict edge, the computing device may determine whether the second task acquires a resource required for execution of the first task and the second task. In response to determining that the second task fails to acquire the resource, the computing device may assign a dynamic task dependency edge from the first task to the second task.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by a computing device for task scheduling in the presence of task conflict edges on a computing device. The computing device may determine whether a first task and a second task are related by a task conflict edge. In response to determining that the first task and the second task are related by the task conflict edge, the computing device may determine whether the second task acquires a resource required for execution of the first task and the second task. In response to determining that the second task fails to acquire the resource, the computing device may assign a dynamic task dependency edge from the first task to the second task.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by a computing device for accelerating execution of a plurality of tasks belonging to a common property task graph. The computing device may identify a first successor task dependent upon a bundled task such that an available synchronization mechanism is a common property for the bundled task and the first successor task, and such that the first successor task only depends upon predecessor tasks for which the available synchronization mechanism is a common property. The computing device may add the first successor task to a common property task graph and add the plurality of tasks belonging to the common property task graph to a ready queue. The computing device may recursively identify successor tasks. The synchronization mechanism may include a synchronization mechanism for control logic flow or a synchronization mechanism for data access.
Abstract:
Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.
Abstract:
Various embodiments may include methods executed by processors of computing devices for geometry based work execution prioritization. The processor may receive events, such as images. The processor may overlay a boundary shape on the event to identify discard regions of the event lying outside the boundary shape. The processor may identify work regions of the events lying within the working boundary shape. The devices may determine a cancellation likelihood for each of the identified work regions of the events. The processor may assign a trimming weight to each of the identified work regions based on the determined cancellation likelihoods. The processor may then add each of the identified work regions as a work item to an execution work list in an order based on the assigned trimming weights. The work items may be processed in order of trimming weight priority.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by a computing device for accelerating execution of a plurality of tasks belonging to a common property task graph. The computing device may identify a first successor task dependent upon a bundled task such that an available synchronization mechanism is a common property for the bundled task and the first successor task, and such that the first successor task only depends upon predecessor tasks for which the available synchronization mechanism is a common property. The computing device may add the first successor task to a common property task graph and add the plurality of tasks belonging to the common property task graph to a ready queue. The computing device may recursively identify successor tasks. The synchronization mechanism may include a synchronization mechanism for control logic flow or a synchronization mechanism for data access.