Abstract:
Methods and apparatus relating to data streaming accelerators are described. In an embodiment, a hardware accelerator such as a Data Streaming Accelerator (DSA) logic circuitry performs data movement and/or data transformation for data to be transferred between a processor (having one or more processor cores) and a storage device. Other embodiments are also disclosed and claimed.
Abstract:
Embodiments of apparatuses, methods, and systems for highly scalable accelerators are described. In an embodiment, an apparatus includes an interface to receive a plurality of work requests from a plurality of clients and a plurality of engines to perform the plurality of work requests. The work requests are to be dispatched to the plurality of engines from a plurality of work queues. The work queues are to store a work descriptor per work request. Each work descriptor is to include all information needed to perform a corresponding work request.
Abstract:
In one embodiment, an apparatus includes a processor comprising an address translation cache (ATC); a shared work queue (SWQ) associated with the ATC, and a port to couple to a host processor over a Peripheral Component Interconnect Express (PCIe)-based link. The apparatus also includes circuitry to receive address translation information from a memory management unit of the host processor that includes virtual memory address to physical memory address translations, store the address translation information in the ATC, receive an invalidation command from the host processor indicating an invalidation of address translation information stored in the ATC, modify the address translation information in the ATC based on the invalidation command, and store completion data in a memory location indicated by the invalidation command.
Abstract:
Systems, methods, and devices can include a processing engine implemented at least partially in hardware, the processing engine to process memory transactions; a memory element to index physical address and virtual address translations; and a memory controller logic implemented at least partially in hardware, the memory controller logic to receive an index from the processing engine, the index corresponding to a physical address and a virtual address; identify a physical address based on the received index; and provide the physical address to the processing engine. The processing engine can use the physical address for memory transactions in response to a streaming workload job request.
Abstract:
Technologies for software testing include a computing device having persistent memory that includes a platform simulator and an application or other code module to be tested. The computing device generates a checkpoint for the application at a test location using the platform simulator. The computing device executes the application from the test location to an end location and traces all writes to persistent memory using the platform simulator. The computing device generates permutations of persistent memory writes that are allowed by the hardware specification of the computing device simulated by the platform simulator. The computing device replays each permutation from the checkpoint, simulates a power failure, and then invokes a user-defined test function using the platform simulator. The computing device may test different permutations of memory writes until the application's use of persistent memory is validated. Other embodiments are described and claimed.
Abstract:
In one embodiment, an apparatus comprises a storage device and a processor. The storage device stores a feature vector index, wherein the feature vector index comprises a sparse-array data structure representing a feature space for a set of labeled feature vectors, wherein the set of labeled feature vectors are assigned to a plurality of classes. The processor is to: receive a query corresponding to a target feature vector; access, via the storage device, a first portion of the feature vector index, wherein the first portion of the feature vector index comprises a subset of labeled feature vectors that correspond to a same portion of the feature space as the target feature vector; determine the corresponding class of the target feature vector based on the subset of labeled feature vectors; and provide a response to the query based on the corresponding class.
Abstract:
Process address space identifier virtualization uses hardware paging hint. The processing device (100) comprising: a processing core (110); and a translation circuit coupled to the processing core, the translation circuit to: receive a workload instruction from a guest application being executed by the processing device, the workload instruction comprising an untranslated guest process address space identifier (gPASID), a workload for an input/output (I/O) target device, and an identifier of a submission register on the I/O target device (410), access a paging data structure (PDS) associated with the guest application to retrieve a page table entry corresponding to the gPASID and the identifier of the submission register (420), determine a value of an I/O hint bit of the page table entry corresponding to the gPASID and the identifier of the submission register (430), responsive to determining that the I/O hint bit is enabled, keep the untranslated gPASID in the workload instruction (440), and provide the workload instruction to a work queue of the I/O target device (450)
Abstract:
Embodiments of apparatuses, methods, and systems for highly scalable accelerators are described. In an embodiment, an apparatus includes an interface to receive a plurality of work requests from a plurality of clients and a plurality of engines to perform the plurality of work requests. The work requests are to be dispatched to the plurality of engines from a plurality of work queues. The work queues are to store a work descriptor per work request. Each work descriptor is to include all information needed to perform a corresponding work request.
Abstract:
Techniques for offload device address translation fetching are disclosed. In the illustrative embodiment, a processor of a compute device sends a translation fetch descriptor to an offload device before sending a corresponding work descriptor to the offload device. The offload device can request translations for virtual memory address and cache the corresponding physical addresses for later use. While the offload device is fetching virtual address translations, the compute device can perform other tasks before sending the corresponding work descriptor, including operations that modify the contents of the memory addresses whose translation are being cached. Even if the offload device does not cache the translations, the fetching can warm up the cache in a translation lookaside buffer. Such an approach can reduce the latency overhead that the offload device may otherwise incur in sending memory address translation requests that would be required to execute the work descriptor.
Abstract:
Embodiments of apparatuses, methods, and systems for unified address translation for virtualization of input/output devices are described. In an embodiment, an apparatus includes first circuitry to use at least an identifier of a device to locate a context entry and second circuitry to use at least a process address space identifier (PASID) to locate a PASID-entry. The context entry is to include at least one of a page-table pointer to a page-table translation structure and a PASID. The PASID-entry is to include at least one of a first-level page-table pointer to a first-level translation structure and a second-level page-table pointer to a second-level translation structure. The PASID is to be supplied by the device. At least one of the apparatus, the context entry, and the PASID entry is to include one or more control fields to indicate whether the first-level page-table pointer or the second-level page-table pointer is to be used.