Abstract:
Method and system for performing data movement operations is described herein. One embodiment of a method includes: storing data for a first memory address in a cache line of a memory of a first processing unit, the cache line associated with a coherency state indicating that the memory has sole ownership of the cache line; decoding an instruction for execution by a second processing unit, the instruction comprising a source data operand specifying the first memory address and a destination operand specifying a memory location in the second processing unit; and responsive to executing the decoded instruction, copying data from the cache line of the memory of the first processing unit as identified by the first memory address, to the memory location of the second processing unit, wherein responsive to the copy, the cache line is to remain in the memory and the coherency state is to remain unchanged.
Abstract:
Technologies for providing information to a user while traveling include a mobile computing device to determine network condition information associated with a route segment. The route segment may be one of a number of route segments defining at least one route from a starting location to a destination. The mobile computing device may determine a route from the starting location to the destination based on the network condition information. The mobile computing device may upload the network condition information to a crowdsourcing server. A mobile computing device may predict a future location of the device based on device context, determine a safety level for the predicted location, and notify the user if the safety level is below a threshold safety level. The device context may include location, time of day, and other data. The safety level may be determined based on predefined crime data. Other embodiments are described and claimed.
Abstract:
Method and apparatus for efficient range-based memory writeback is described herein. One embodiment of an apparatus includes a system memory, a plurality of hardware processor cores each of which includes a first cache, a decoder circuitry to decode an instruction having fields for a first memory address and a range indicator, and an execution circuitry to execute the decoded instruction. Together, the first memory address and the range indicator define a contiguous region in the system memory that includes one or more cache lines. An execution of the decoded instruction causes any instances of the one or more cache lines in the first cache to be invalidated. Additionally, any invalidated instances of the one or more cache lines that are dirty are to be stored to system memory.
Abstract:
Technologies for identifying a cache line of a network packet for eviction from an on-processor cache of a network device communicatively coupled to a network controller. The network device is configured to determine whether a cache line of the cache corresponding to the network packet is to be evicted from the cache based on a determination that the network packet is not needed subsequent to processing the network packet, and provide an indication that the cache line is to be evicted from the cache based on an eviction policy received from the network controller.
Abstract:
Technologies for distributed table lookup via a distributed router includes an ingress computing node, an intermediate computing node, and an egress computing node. Each computing node of the distributed router includes a forwarding table to store a different set of network routing entries obtained from a routing table of the distributed router. The ingress computing node generates a hash key based on the destination address included in a received network packet. The hash key identifies the intermediate computing node of the distributed router that stores the forwarding table that includes a network routing entry corresponding to the destination address. The ingress computing node forwards the received network packet to the intermediate computing node for routing. The intermediate computing node receives the forwarded network packet, determines a destination address of the network packet, and determines the egress computing node for transmission of the network packet from the distributed router.
Abstract:
Apparatus and methods implementing a hardware predictor for reducing performance inversions caused by intra-core data transfer during inter-core data transfer optimization for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache or mid-level cache (MLC) for each core and a shared L3 or last-level cache (LLC). A hardware predictor to monitor accesses to sample cache lines and, based on these accesses, adaptively control the enablement of cache line demotion instructions for proactively demoting cache lines from lower cache levels to higher cache levels, including demoting cache lines from L1 or L2 caches (MLC) to L3 cache (LLC).
Abstract:
Technologies for identifying a cache line of a network packet for eviction from an on-processor cache of a network device communicatively coupled to a network controller. The network device is configured to determine whether a cache line of the cache corresponding to the network packet is to be evicted from the cache based on a determination that the network packet is not needed subsequent to processing the network packet, and provide an indication that the cache line is to be evicted from the cache based on an eviction policy received from the network controller.
Abstract:
Techniques for modifying a transmission rate of a device having a plurality of transmission rate options are described herein. The techniques include a method comprising receiving data from a sensor indicating movement of an electronic device, the electronic device having a plurality of transmission rate options. Fail ratio metrics are gathered. The fail ratio metrics indicate a ratio of failed transmissions to successful transmissions for rate option during device movement. The method includes determining whether a given rate option has a fail ratio above a predetermined threshold; and, if so, disabling the given rate option while the device is moving.
Abstract:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
Abstract:
Examples described herein include a device interface; a first set of one or more processing units; and a second set of one or more processing units. In some examples, the first set of one or more processing units are to perform heavy flow detection for packets of a flow and the second set of one or more processing units are to perform processing of packets of a heavy flow. In some examples, the first set of one or more processing units and second set of one or more processing units are different. In some examples, the first set of one or more processing units is to allocate pointers to packets associated with the heavy flow to a first set of one or more queues of a load balancer and the load balancer is to allocate the packets associated with the heavy flow to one or more processing units of the second set of one or more processing units based, at least in part on a packet receive rate of the packets associated with the heavy flow.