Abstract:
Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
Abstract:
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.
Abstract:
A processor of an aspect includes a register to store a condition code bit, and a decode unit to decode a bit check instruction. The bit check instruction is to indicate a first source operand that is to include a first bit, and is to indicate a check bit value for the first bit. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the bit check instruction, is to compare the first bit with the check bit value, and update a condition code bit to indicate whether the first bit equals or does not equal the check bit value. Other processors, methods, systems, and instructions are disclosed.
Abstract:
Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
Abstract:
Technologies for a distributed hardware queue manager include a compute device having a processor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.
Abstract:
Technologies for moving workloads between hardware queue managers include a compute device. The compute device includes a set of hardware queue managers. Each hardware queue manager is to manage one or more queues of queue elements and each queue element is indicative of a data set to be operated on by a thread. The compute device also includes circuitry to execute a workload with a first hardware queue manager of the set of hardware queue managers, determine whether a workload migration condition is present, determine whether a second hardware queue manager of the set of hardware queue managers has sufficient capacity to manage a set of queues associated with the workload, move, in response to a determination that the second hardware queue manager does have sufficient capacity, the workload to the second hardware queue manager, and reduce, after the move of the workload to the second hardware queue manager, a power usage of the first hardware queue manager.
Abstract:
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.
Abstract:
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.
Abstract:
In an embodiment, a shared memory fabric is configured to receive memory requests from multiple agents, where at least some of the requests have an associated deadline value to indicate a maximum latency prior to completion of the memory request. Responsive to the requests, the fabric is to arbitrate between the requests based at least in part on the deadline values. Other embodiments are described and claimed.
Abstract:
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.