Abstract:
A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.
Abstract:
An apparatus and method for performing vector index loads and stores. For example, one embodiment of a processor comprises: a vector index register to store a plurality of index values; a mask register to store a plurality of mask bits; a vector register to store a plurality of vector data elements loaded from memory; and vector index load logic to identify an index stored in the vector index register to be used for a load operation using an immediate value and to responsively combine the index with a base memory address to determine a memory address for the load operation, the vector index load logic to load vector data elements from the memory address to the vector register in accordance with the plurality of mask bits.
Abstract:
A processor includes packed data registers, a decode unit, and an execution unit. The decode unit is to decode a four-dimensional (4D) Morton coordinate conversion instruction. The 4D Morton coordinate conversion instruction is to indicate a source packed data operand that is to include a plurality of 4D Morton coordinates, and is to indicate one or more destination storage locations. The execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the decode unit decoding the 4D Morton coordinate conversion instruction, is to store one or more result packed data operands in the one or more destination storage locations. The one or more result packed data operands are to include a plurality of sets of four 4D coordinates. Each of the sets of the four 4D coordinates is to correspond to a different one of the 4D Morton coordinates.
Abstract:
A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
Abstract:
A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
Abstract:
A method of an aspect includes receiving a unique packed data element identification instruction. The unique packed data element identification instruction indicates a source packed data having a plurality of packed data elements and indicates a destination storage location. A unique packed data element identification result is stored in the destination storage location in response to the unique packed data element identification instruction. The unique packed data element identification result indicates which of the plurality of the packed data elements are unique in the source packed data. Other methods, apparatus, systems, and instructions are disclosed.
Abstract:
In one embodiment a processing device implements a set of instructions to perform an inverse centrifuge operation using vector or general purpose registers. The inverse centrifuge operation interleaves bits from opposite regions of a source and writes the interleaved bits to a destination. The instructions use a control mask where each bit with a mask value of one is obtained from one side of the source register or vector elements with a mask of zero are obtained from the opposing side.
Abstract:
A processor includes a plurality of packed data registers, a decode unit, and an execution unit. The decode unit is to decode a three-dimensional (3D) Morton coordinate conversion instruction. The 3D Morton coordinate conversion instruction to indicate a source packed data operand that is to include a plurality of 3D Morton coordinates, and to indicate one or more destination storage locations. The execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the decode unit decoding the 3D Morton coordinate conversion instruction, is to store one or more result packed data operands in the one or more destination storage locations. The one or more result packed data operands are to include a plurality of sets of three 3D coordinates. Each of the sets of the three 3D coordinates is to correspond to a different one of the 3D Morton coordinates.
Abstract:
In several embodiments, vector extensions to an instruction set architecture include instructions to perform saturated signed and unsigned integer additions. In one embodiment, a vector signed integer add with signed saturation is provided. In one embodiment, a vector unsigned integer add with unsigned saturation is provided. In one embodiment, packed doubleword and quadword integers are supported for both signed and unsigned instructions.
Abstract:
Technologies for sharing route navigation data in a community cloud include a mobile navigation device of a vehicle and a remote mobile navigation device of a remote vehicle. The mobile navigation device generates sensor data associated with a current route of the vehicle and determines whether a reference traffic event occurs within a segment of the current route of the vehicle. In response to a determination that a reference traffic event occurs, the mobile navigation devices transmits route update data to the remote mobile navigation device. Based on the route update data, the remote mobile navigation device updates a current route of the remote vehicle to avoid the reference traffic event within a corresponding segment of the current route of the remote vehicle. The mobile navigation device may also transmit the sensor data to a community compute device, which may transmit route update data to the remote mobile navigation device.