Abstract:
Methods, devices, systems, and non-transitory process-readable storage media for a multi-processor computing device to schedule multi-versioned tasks on a plurality of processing units. An embodiment method may include processor-executable operations for enqueuing a specialized version of a multi-versioned task in a task queue for each of the plurality of processing units, wherein each specialized version is configured to be executed by a different processing unit of the plurality of processing units, providing ownership over the multi-versioned task to a first processing unit when the first processing unit is available to immediately execute a corresponding specialized version of the multi-versioned task, and discarding other specialized versions of the multi-versioned task in response to providing ownership over the multi-versioned task to the first processing unit. Various operations of the method may be performed via a runtime functionality.
Abstract:
A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing speculative loop iteration partitioning (SLIP) for heterogeneous processing devices. A computing device may receive iteration information for a first partition of iterations of a repetitive process and select a SLIP heuristic based on available SLIP information and iteration information for the first partition. The computing device may determine a split value for the first partition using the SLIP heuristic, and partition the first partition using the split value to produce a plurality of next partitions.
Abstract:
Embodiments include computing devices, systems, and methods identifying enhanced synchronization operation outcomes. A computing device may receive a first resource access request for a first resource of a computing device including a first requester identifier from a first computing element of the computing device. The computing device may also receive a second resource access request for the first resource including a second requester identifier from a second computing element of the computing device. The computing device may grant the first computing element access to the first resource based on the first resource access request, and return a response to the second computing element including the first requester identifier as a winner computing element identifier.
Abstract:
Various embodiments may include methods executed by processors of computing devices for geometry based work execution prioritization. The processor may receive events, such as images. The processor may overlay a boundary shape on the event to identify discard regions of the event lying outside the boundary shape. The processor may identify work regions of the events lying within the working boundary shape. The devices may determine a cancellation likelihood for each of the identified work regions of the events. The processor may assign a trimming weight to each of the identified work regions based on the determined cancellation likelihoods. The processor may then add each of the identified work regions as a work item to an execution work list in an order based on the assigned trimming weights. The work items may be processed in order of trimming weight priority.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by a computing device for accelerating execution of a plurality of tasks belonging to a common property task graph. The computing device may identify a first successor task dependent upon a bundled task such that an available synchronization mechanism is a common property for the bundled task and the first successor task, and such that the first successor task only depends upon predecessor tasks for which the available synchronization mechanism is a common property. The computing device may add the first successor task to a common property task graph and add the plurality of tasks belonging to the common property task graph to a ready queue. The computing device may recursively identify successor tasks. The synchronization mechanism may include a synchronization mechanism for control logic flow or a synchronization mechanism for data access.
Abstract:
Methods, devices, systems, and non-transitory process-readable storage media for a multi-processor computing device to schedule multi-versioned tasks on a plurality of processing units. An embodiment method may include processor-executable operations for enqueuing a specialized version of a multi-versioned task in a task queue for each of the plurality of processing units, wherein each specialized version is configured to be executed by a different processing unit of the plurality of processing units, providing ownership over the multi-versioned task to a first processing unit when the first processing unit is available to immediately execute a corresponding specialized version of the multi-versioned task, and discarding other specialized versions of the multi-versioned task in response to providing ownership over the multi-versioned task to the first processing unit. Various operations of the method may be performed via a runtime functionality.
Abstract:
Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
Abstract:
A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory- based inter-task, inter-thread or inter-process communications required by conventional solutions.