Abstract:
Technologies for processing network packets a compute device with a network interface controller (NIC) that includes a host interface, a packet processor, and a network interface. The host interface is configured to receive a transaction from the compute engine, wherein the transaction includes latency-sensitive data, determine a context of the latency-sensitive data, and verify the latency-sensitive data against one or more server policies as a function of the determined context. The packet processor is configured to identify a trust associated with the latency-sensitive data, determine whether to verify the latency-sensitive data against one or more network policies as a function of the identified trust, apply the one or more network policies, and encapsulate the latency-sensitive data into a network packet. The network interface is configured to transmit the network packet via an associated Ethernet port of the NIC. Other embodiments are described herein.
Abstract:
Technologies for processing network packets a compute device with a network interface controller (NIC) that includes a host interface, a packet processor, and a network interface. The host interface is configured to receive a transaction from the compute engine, wherein the transaction includes latency-sensitive data, determine a context of the latency-sensitive data, and verify the latency-sensitive data against one or more server policies as a function of the determined context. The packet processor is configured to identify a trust associated with the latency-sensitive data, determine whether to verify the latency-sensitive data against one or more network policies as a function of the identified trust, apply the one or more network policies, and encapsulate the latency-sensitive data into a network packet. The network interface is configured to transmit the network packet via an associated Ethernet port of the NIC. Other embodiments are described herein.
Abstract:
System, method, and apparatus for improving the performance of collective operations in High Performance Computing (HPC). Compute nodes in a networked HPC environment form collective groups to perform collective operations. A spanning tree is formed including the compute nodes and switches and links used to interconnect the compute nodes, wherein the spanning tree is configured such that there is only a single route between any pair of nodes in the tree. The compute nodes implement processes for performing the collective operations, which includes exchanging messages between processes executing on other compute nodes, wherein the messages contain indicia identifying collective operations they belong to. Each switch is configured to implement message forwarding operations for its portion of the spanning tree. Each of the nodes in the spanning tree implements a ratcheted cyclical state machine that is used for synchronizing collective operations, along with status messages that are exchanged between nodes. Transaction IDs are also used to detect out-of-order and lost messages.
Abstract:
Method and apparatus for sending packets using optimized PIO write sequences without sfences. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received at a processor supporting out of order execution. The PIO write instructions are received in an original order and executed out of order, with each PIO write instruction writing a store unit of data to a store buffer or a store block of data to the store buffer. Logic is provided for the store buffer to detect when store blocks are filled, resulting in the data in those store blocks being drained via PCIe posted writes that are written to send blocks in the PIO send memory at addresses defined by the PIO write instructions. Logic is employed for detecting the fill size of packets and when a packet's send blocks have been filled, enabling the packet data to be eligible for egress.
Abstract:
Examples described herein relate to a first graphics processing unit (GPU) with at least one integrated communications system, wherein the at least one integrated communications system is to apply a reliability protocol to communicate with a second at least one integrated communications system associated with a second GPU to copy data from a first memory region to a second memory region and wherein the first memory region is associated with the first GPU and the second memory region is associated with the second GPU.
Abstract:
In one embodiment, a system includes a device and a host. The device includes a device stream buffer. The host includes a processor to execute at least a first application and a second application, a host stream buffer, and a host scheduler. The first application is associated with a first transmit streaming channel to stream first data from the first application to the device stream buffer. The first transmit streaming channel has a first allocated amount of buffer space in the device stream buffer. The host scheduler schedules enqueue of the first data from the first application to the first transmit streaming channel based at least in part on availability of space in the first allocated amount of buffer space in the device stream buffer. Other embodiments are described and claimed.
Abstract:
Described herein are optimized packet headers for Ethernet IP networks and related methods and devices. An example packet header includes a field comprising a source identifier (SID), the SID comprising a shortened representation of a complete Internet Protocol (IP) address of a source network device, a field comprising a destination identifier (DID), the DID comprising a shortened representation of a complete IP address of a destination network device, and a field having a total number of bits that is less than 8 and comprising a shortened representation of a type of encapsulation protocol for the packet. The packet header excludes fields comprising the complete IP address and a media access controller (MAC) address of the source network device, fields comprising the complete IP address and the MAC address of the destination network device, a field comprising a header checksum, and a field comprising a total size of the packet.
Abstract:
Method and apparatus for sending packets using optimized PIO write sequences without sfences. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received at a processor supporting out of order execution. The PIO write instructions are received in an original order and executed out of order, with each PIO write instruction writing a store unit of data to a store buffer or a store block of data to the store buffer. Logic is provided for the store buffer to detect when store blocks are filled, resulting in the data in those store blocks being drained via PCIe posted writes that are written to send blocks in the PIO send memory at addresses defined by the PIO write instructions. Logic is employed for detecting the fill size of packets and when a packet's send blocks have been filled, enabling the packet data to be eligible for egress.
Abstract:
Method and apparatus for implementing an optimized credit return mechanism for packet sends. A Programmed Input/Output (PIO) send memory is partitioned into a plurality of send contexts, each comprising a memory buffer including a plurality of send blocks configured to store packet data. A storage scheme using FIFO semantics is implemented with each send block associated with a respective FIFO slot. In response to receiving packet data written to the send blocks and detecting the data in those send blocks has egressed from a send context, corresponding freed FIFO slots are detected, and a lowest slot for which credit return indicia has not be returned is determined. The highest slot in a sequence of freed slots from the lowest slot is then determined, and corresponding credit return indicia is returned. In one embodiment an absolute credit return count is implemented for each send context, with an associated absolute credit sent count tracked via software that writes to the PIO send memory, with the two absolute credit counts used for flow control.
Abstract:
Method and apparatus for implementing an optimized credit return mechanism for packet sends. A Programmed Input/Output (PIO) send memory is partitioned into a plurality of send contexts, each comprising a memory buffer including a plurality of send blocks configured to store packet data. A storage scheme using FIFO semantics is implemented with each send block associated with a respective FIFO slot. In response to receiving packet data written to the send blocks and detecting the data in those send blocks has egressed from a send context, corresponding freed FIFO slots are detected, and a lowest slot for which credit return indicia has not be returned is determined. The highest slot in a sequence of freed slots from the lowest slot is then determined, and corresponding credit return indicia is returned. In one embodiment an absolute credit return count is implemented for each send context, with an associated absolute credit sent count tracked via software that writes to the PIO send memory, with the two absolute credit counts used for flow control.