Abstract:
A network processor dataflow chip and method for flexible dataflow are provided. The dataflow chip comprises a plurality of on-chip data transmission and scheduling circuit structures. The data transmission and scheduling circuit structures are selected responsive to indicators. Data transmission circuit structures may comprise selectable frame processing and data transmission functions. Selectable frame processing may comprise cut and paste, full dispatch and store and dispatch frame processing. Scheduling functions include full internal scheduling, calendar scheduling in communication with an external scheduler, and external calendar scheduling. In another aspect of the present invention, data transmission functions may comprise low latency and normal latency external processor interfaces for selectively providing privileged access to dataflow chip resources.
Abstract:
A system for reducing latency in a host Ethernet adapter (HEA) includes the following. First, the HEA receives a packet with an internet protocol (IP) header and data in the HEA. The HEA parses a connection identifier from the IP header and accesses a negative cache in the HEA to determine if the connection identifier is not in a memory external to the HEA. The HEA applies a default treatment to the packet if the connection identifier is not in the memory, thereby reducing latency by decreasing access to the memory.
Abstract:
A system and method for multicore processing of communications between data processing devices are provided. With the mechanisms of the illustrative embodiments, a set of techniques that enables sustaining media speed by distributing transmit and receive-side processing over multiple processing cores is provided. In addition, these techniques also enable designing multi-threaded network interface controller (NIC) hardware that efficiently hides the latency of direct memory access (DMA) operations associated with data packet transfers over an input/output (I/O) bus. Multiple processing cores may operate concurrently using separate instances of a communication protocol stack and device drivers to process data packets for transmission with separate hardware implemented send queue managers in a network adapter processing these data packets for transmission. Multiple hardware receive packet processors in the network adapter may be used, along with a flow classification engine, to route received data packets to appropriate receive queues and processing cores for processing.
Abstract:
A scalable method for efficient dynamic allocation of buffer resources in a store-and-forward device, such that high utilization can be maintained with small average buffer occupancy by providing asymmetric congestion control with opportune random detection. Also provided is tolerance of transient onset of congestion and fairness toward bursty traffic with ready reaction to declines in congestion.
Abstract:
Systems and methods for implementing counters in a network processor with cost effective memory are disclosed. Embodiments include systems and methods for implementing counters in a network processor using less expensive memory such as DRAM. A network processor receives packets and implements accounting functions including counting packets in each of a plurality of flow queues. Embodiments include a counter controller that may increment counter values more than once during a R-M-W cycle. Each time a counter controller receives a request to update a counter during a R-M-W cycle that has been initiated for the counter, the counter controller increments the counter value received from memory. The incremented value is written to memory during the write cycle of the R-M-W cycle. A write disable unit disables writes that would otherwise occur during R-M-W cycles initiated for the counter during the earlier initiated R-M-W cycle.
Abstract:
A mechanism for offloading the management of receive queues in a split (e.g. split socket, split iSCSI, split DAFS) stack environment, including efficient queue flow control and TCP/IP retransmission support. An Upper Layer Protocol (ULP) creates receive work queues and completion queues that are utilized by an Internet Protocol Suite Offload Engine (IPSOE) and the ULP to transfer information and carry out send operations. As consumers initiate receive operations, receive work queue entries (RWQEs) are created by the ULP and written to the receive work queue (RWQ). The ISPOE is notified of a new entry to the RWQ and it subsequently reads this entry that contains pointers to the data that is to be received. After the data is received, the IPSOE creates a completion queue entry (CQE) that is written into the completion queue (CQ). After the CQE is written, the ULP subsequently processes the entry and removes it from the CQE, freeing up a space in both the RWQ and CQ. The number of entries available in the RWQ are monitored by the ULP so that it does not overwrite any valid entries. Likewise, the IPSOE monitors the number of entries available in the CQ, so as not overwrite the CQ.
Abstract:
Systems and methods for scheduling data packets in a network processor are disclosed. Embodiments provide a network processor that comprises a best-effort scheduler with a minimal calendar structure for addressing schedule control blocks. In one embodiment, a four-entry calendar structure provides for rate-limited weighted best effort scheduling. Each of a plurality of different flows has associated schedule control blocks. Schedule control blocks are stored as linked lists in a last-in-first-out buffer. Each calendar entry is associated with a different linked list by storing in the calendar entry the address of the first-out schedule control block in the linked list. Each schedule control block has a counter and is assigned a rate limit according to the bandwidth priority of the flow to which the corresponding packet belongs. Each time a schedule control block is accessed from a last-in-first-out buffer storing the linked list, the scheduler generates a scheduling event and the counter of the schedule control block is incremented. When an incremented counter of a schedule control block equals its rate limit, the schedule control block is temporarily removed from further scheduling until a time interval concludes.
Abstract:
A method and system for performing a lookup for a packet in a computer network are disclosed. The packet includes a header. The method and system include providing a parser, providing a lookup engine coupled with the parser, and providing a processor coupled with the lookup engine. The parser is for parsing the packet for the header prior to receipt of the packet being completed. The lookup engine performs a lookup for the header and returns a resultant. In one aspect, the lookup includes performing a local lookup of a cache that includes resultants of previous lookups. The processor processes the resultant.
Abstract:
A system and method for reducing latency in a host Ethernet adapter (HEA) includes the following. First, the HEA receives a packet with an internet protocol (IP) header and data in the HEA. The HEA parses a connection identifier from the IP header and accesses a negative cache in the HEA to determine if the connection identifier is not in a memory external to the HEA. The HEA applies a default treatment to the packet if the connection identifier is not in the memory, thereby reducing latency by decreasing access to the memory.
Abstract:
A pipeline configuration is described for use in network traffic management for the hardware scheduling of events arranged in a hierarchical linkage. The configuration reduces costs by minimizing the use of external SRAM memory devices. This results in some external memory devices being shared by different types of control blocks, such as flow queue control blocks, frame control blocks and hierarchy control blocks. Both SRAM and DRAM memory devices are used, depending on the content of the control block (Read-Modify-Write or ‘read’ only) at enqueue and dequeue, or Read-Modify-Write solely at dequeue. The scheduler utilizes time-based calendars and weighted fair queueing calendars in the egress calendar design. Control blocks that are accessed infrequently are stored in DRAM memory while those accessed frequently are stored in SRAM.