Abstract:
Processor affinity of an application/thread may be used to deliver an interrupt caused by the application/thread to a best processor at runtime. The processor to which the interrupt is delivered may either run the target application/thread or be located in the same socket as the processor that runs the target application/thread. The processor affinity of the application/thread may be pushed down at runtime to a network device, a chipset, a memory control hub (“MCH”), or an input/output hub (“IOH”), which will facilitate delivery of the interrupt using that affinity information.
Abstract:
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Abstract:
An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes one or more processors to facilitate receiving a manifest corresponding to graph nodes representing regions of memory of a remote client machine, the graph nodes corresponding to a command buffer and to associated data structures and kernels of the command buffer used to initialize a hardware accelerator and execute the kernels, and the manifest indicating a destination memory location of each of the graph nodes and dependencies of each of the graph nodes; identifying, based on the manifest, the command buffer and the associated data structures to copy to the host memory; identifying, based on the manifest, the kernels to copy to local memory of the hardware accelerator; and patching addresses in the command buffer copied to the host memory with updated addresses of corresponding locations in the host memory.
Abstract:
An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes one or more processors to: provide a remote GPU middleware layer to act as a proxy for an application stack on a client platform that is separate from the remote server platform, wherein the remote GPU middleware layer comprises is to expose an abstraction of the remote GPU to userspace components of a remote GPU stack, the userspace components running on the client machine; communicate with a kernel mode driver of the one or more processors to cause the host memory to be allocated for data structures used to communicate commands between the client and the remote GPU; and invoke the kernel mode driver to submit a workload generated by the application stack, the workload submitted for processing by the remote GPU using the data structures allocated in the host memory.
Abstract:
An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes a source remote direct memory access (RDMA) network interface controller (RNIC); a queue to store a data entry corresponding to an RDMA request between the source RNIC and a sink RNIC; a data buffer to store data for an RDMA transfer corresponding to the RDMA request, the RDMA transfer between the source RNIC and the sink RNIC; and a trusted execution environment (TEE) comprising an authentication tag controller to: initialize a first authentication tag calculated using a first key known between a source consumer generating the RDMA request and the source RNIC; associate the first authentication tag with the data entry as integrity verification; initialize a second authentication tag calculated using a second key; and associate the second authentication tag with the data buffer as integrity verification for the data buffer.
Abstract:
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Abstract:
Technologies for remote direct memory access (RDMA) queue pair quality of service (QoS) management are disclosed. In the illustrative embodiment, several queue pairs associated with a virtual machine on a compute sled may be created in a network interface controller of the compute sled. A QoS parameter such as a class of service identifier or a weighting may be assigned to each queue pair such that each queue pair has a different available bandwidth. The compute sled may also predict future RDMA queue pair bandwidth usage and adjust RDMA queue pair bandwidth allocation based on the prediction.
Abstract:
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Abstract:
An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes a programmable integrated circuit (IC) comprising secure device manager (SDM) hardware circuitry to: receive a tenant bitstream of a tenant and a tenant use policy for utilization of the programmable IC via the tenant bitstream, wherein the tenant use policy is cryptographically bound to the tenant bitstream by a cloud service provider (CSP) authorizing entity and signed with a signature of the CSP authorizing entity; in response to successfully verifying the signature, extract the tenant use policy to provide to a policy manager of the programmable IC for verification; in response to the policy manager verifying the tenant bitstream based on the tenant use policy, configure a partial reconfiguration (PR) region of the programable IC using the tenant bitstream; and associate a slot ID of the PR region with the tenant use policy.
Abstract:
An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes a source remote direct memory access (RDMA) network interface controller (RNIC); a queue to store a data entry corresponding to an RDMA request between the source RNIC and a sink RNIC; a data buffer to store data for an RDMA transfer corresponding to the RDMA request, the RDMA transfer between the source RNIC and the sink RNIC; and a trusted execution environment (TEE) comprising an authentication tag controller to: initialize a first authentication tag calculated using a first key known between a source consumer generating the RDMA request and the source RNIC; associate the first authentication tag with the data entry as integrity verification; initialize a second authentication tag calculated using a second key; and associate the second authentication tag with the data buffer as integrity verification for the data buffer.