Abstract:
A method for managing a queue descriptor cache of a host channel adaptor (HCA) includes obtaining a queue descriptor from memory. The queue descriptor includes data describing a queue and the memory is located in a host system. The method further includes storing a copy of the queue descriptor in the queue descriptor cache of the HCA. The HCA accesses the copy of the queue descriptor to obtain the plurality of data, accesses the queue using the data, and updates the data to reflect the access to the queue. The method further includes calculating, using the data, a value corresponding to utilization of the queue, comparing the value against a threshold, fetching, if the value exceeds the threshold, a new copy of the queue descriptor from memory, and replacing the copy of the queue descriptor in the queue descriptor cache with the new copy obtained from the memory.
Abstract:
A method for managing a queue descriptor cache of a host channel adaptor (HCA) includes obtaining a queue descriptor from memory. The queue descriptor includes data describing a queue and the memory is located in a host system. The method further includes storing a copy of the queue descriptor in the queue descriptor cache of the HCA. The HCA accesses the copy of the queue descriptor to obtain the plurality of data, accesses the queue using the data, and updates the data to reflect the access to the queue. The method further includes calculating, using the data, a value corresponding to utilization of the queue, comparing the value against a threshold, fetching, if the value exceeds the threshold, a new copy of the queue descriptor from memory, and replacing the copy of the queue descriptor in the queue descriptor cache with the new copy obtained from the memory.
Abstract:
A method for optimized address pre-translation for a host channel adapter (HCA) static memory structure is disclosed. The method involves determining whether the HCA static memory structure spans a contiguous block of physical address space, when the HCA static memory structure spans the contiguous block of physical address space, requesting a translation from a guest physical address (GPA) to a machine physical address (MPA) of the HCA static memory structure, storing a received MPA corresponding to the HCA static memory structure in an address control and status register (CSR) associated with the HCA static memory structure, marking the received MPA stored in the address CSR as a pre-translated address, and using the pre-translated MPA stored in the address CSR when a request to access the static memory structure is received.
Abstract:
A method for optimizing completion building is disclosed. The method involves receiving a work request by a host channel adapter (HCA), caching a portion of the work request in a completion cache in the HCA, wherein the cached portion of the work request includes information for building a completion for the work request, receiving, by the HCA, a response to the work request, querying the completion cache upon receiving the response to the work request to obtain the cached portion of the work request, and building the completion for the work request using the cached portion of the work request, wherein the completion informs a software application of at least a status of the work request as executed by the HCA.
Abstract:
A method for processing commands includes receiving, for multiple commands, doorbells for writing to a send queue scheduler buffer on a host channel adapter (HCA). The send queue scheduler buffer is associated with a send queue scheduler. The method further includes detecting a potential deadlock of the send queue scheduler from processing a portion of the doorbells, writing a subset of the doorbells to a doorbell overflow buffer on a host, operatively connected to the HCA, based on detecting the potential deadlock, and discarding the subset by the send queue scheduler without processing the subset of the plurality of doorbells before discarding.
Abstract:
A method for offloading includes a host channel adapter (HCA) receiving a first work request identifying a queue pair (QP), making a first determination that the QP is a proxy QP, and offloading the first work request to a proxy central processing unit (CPU) based on the first determination and based on the first work request satisfying a filter criterion. The HCA further receives a second work request identifying the QP, processes the second work request without offloading based on the QP being a proxy QP and based on the first work request failing to satisfy the filter criterion. The HCA redirects a first completion for the first work request and a second completion for the second work request to the proxy CPU based on the first determination. The proxy CPU processes the first completion and the second completion in order.
Abstract:
A method for deallocation of a memory region involving transmitting, by a host channel adapter (HCA), a first invalidation command for invalidating at least one key associated with the memory region, transmitting, by the HCA, a second invalidation command for invalidating a translation lookaside buffer (TLB) entry for the memory region, invalidate the at least one key associated with the memory region, determining whether all memory access requests to the memory region have been processed by the HCA, stalling processing of the second invalidation command when outstanding memory access requests to the memory region are present, and processing the outstanding memory access requests for the memory region by the HCA before executing the second invalidation command invalidating the TLB entry for the memory region.
Abstract:
A method for deallocation of a memory region involving transmitting, by a host channel adapter (HCA), a first invalidation command for invalidating at least one key associated with the memory region, transmitting, by the HCA, a second invalidation command for invalidating a translation lookaside buffer (TLB) entry for the memory region, invalidate the at least one key associated with the memory region, determining whether all memory access requests to the memory region have been processed by the HCA, stalling processing of the second invalidation command when outstanding memory access requests to the memory region are present, and processing the outstanding memory access requests for the memory region by the HCA before executing the second invalidation command invalidating the TLB entry for the memory region.
Abstract:
A method for transmitting a message includes a communication adapter receiving, from a transmitting device, a request to send the message. The method further includes modifying a maximum transfer unit (MTU) to obtain a modified MTU, transmitting, from the communication adapter to a receiving system, a first sub-unit of the message using the modified MTU, iteratively increasing the MTU for transmitting intermediate sub-units of the message until an MTU limit is reached, and transmitting, to the receiving system, the intermediate sub-units of the message. The intermediate sub-units are transmitted after the first sub-unit and before a second sub-unit. The method further includes transmitting, from the communication adapter to the receiving system, the second sub-unit to the receiving system using a full path MTU.
Abstract:
A method for transmitting a message includes a communication adapter receiving, from a transmitting device, a request to send the message. The method further includes modifying a maximum transfer unit (MTU) to obtain a modified MTU, transmitting, from the communication adapter to a receiving system, a first sub-unit of the message using the modified MTU, iteratively increasing the MTU for transmitting intermediate sub-units of the message until an MTU limit is reached, and transmitting, to the receiving system, the intermediate sub-units of the message. The intermediate sub-units are transmitted after the first sub-unit and before a second sub-unit. The method further includes transmitting, from the communication adapter to the receiving system, the second sub-unit to the receiving system using a full path MTU.