Abstract:
Embodiments may be generally direct to apparatuses, systems, method, and techniques to detect a message to communicate via an interconnect coupled with a device capable of communication via a plurality of interconnect protocols, the plurality of interconnect protocols comprising a non-coherent interconnect protocol, a coherent interconnect protocol, and a memory interconnect protocol. Embodiments also include determining an interconnect protocol of the plurality of interconnect protocols to communicate the message via the interconnect based on the message, and providing the message to a multi-protocol multiplexer coupled with the interconnect, the multi-protocol multiplexer to communicate the message utilizing the interconnect protocol via the interconnect with the device.
Abstract:
In embodiments, an apparatus for computing includes a protection key register (PKR) having 2N bits, where N is an integer, to store a plurality of permission entries corresponding to protected memory domains, and a protected memory domain controller, coupled to the PKR. In embodiments, the memory domain controller is to: obtain protection key (PK) bits from a page table entry for a target page address; obtain one or more additional PK bits from a target linear memory address; and combine the PK bits and the additional PK bits to form a PK domain number to index into the plurality of permission entries in the PKR to obtain a permission entry for a protected memory domain.
Abstract:
Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.
Abstract:
A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.
Abstract:
A processing system includes a processing core to execute a task and an input output (IO) memory management unit, coupled to the core. The IO memory management unit includes a storage unit to store a page table entry including an identifier of a memory domain and a protection key associated with the identifier. The protection key indicates whether a memory page in the memory domain is accessible. The IO memory management unit also includes a protection key register comprising a field indexed by the protection key, the field including a set of bits reflecting a memory access permission associated with the protection key. The protection key register is, responsive to receiving a request from an IO device to store data associated with the process or the thread of the process, to one of allow or deny permission to access the memory page in the memory domain for storage of the data associated with the process or the thread of the process based on the protection key.
Abstract:
A processor includes multiple physical cores that support multiple logical cores of different core types, where the core types include a big core type and a small core type. A multi-threaded application includes multiple software threads are concurrently executed by a first subset of logical cores in a first time slot. Based on data gathered from monitoring the execution in the first time slot, the processor selects a second subset of logical cores for concurrent execution of the software threads in a second time slot. Each logical core in the second subset has one of the core types that matches the characteristics of one of the software threads.
Abstract:
Embodiments may be generally direct to apparatuses, systems, method, and techniques to detect a message to communicate via an interconnect coupled with a device capable of communication via a plurality of interconnect protocols, the plurality of interconnect protocols comprising a non-coherent interconnect protocol, a coherent interconnect protocol, and a memory interconnect protocol. Embodiments also include determining an interconnect protocol of the plurality of interconnect protocols to communicate the message via the interconnect based on the message, and providing the message to a multi-protocol multiplexer coupled with the interconnect, the multi-protocol multiplexer to communicate the message utilizing the interconnect protocol via the interconnect with the device.
Abstract:
Enforcing memory operand types using protection keys is generally described herein. A processor system to provide sandbox execution support for protection key rights attacks includes a processor core to execute a task associated with an untrusted application and execute the task using a designated page of a memory; and a memory management unit to designate the page of the memory to support execution of the untrusted application.
Abstract:
Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.
Abstract:
Techniques for increasing link efficiency are disclosed. In one embodiment, a device handle table is created at each end of a link. Device handle allocation messages can be used to associate a particular device handle with a particular domain identifier, such as a bus/device/function (BDF) identifier or a processor address space identifier (PASID). Once a device handle is allocated, messages can be sent between the two ends of the link that include the device handle. The device handle can be used to determine the domain identifier associated with the message. As the device handle can be fewer bits than the domain identifier, the link efficiency can be increased.