Abstract:
In one embodiment, an apparatus includes: an instruction fetch circuit to fetch instructions; a decode circuit coupled to the instruction fetch circuit to decode the fetched instructions into micro-operations (pops); a scheduler coupled to the decode circuit to schedule the pops for execution; and an execution circuit coupled to the scheduler, the execution circuit comprising a plurality of execution ports to execute the pops. The scheduler may be configured to: schedule at least some pops of a first type for redundant execution on symmetric execution ports of the plurality of execution ports; and schedule pops of a second type for non-redundant execution on a single execution port of the plurality of execution ports. Other embodiments are described and claimed.
Abstract:
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
Abstract:
Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.
Abstract:
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
Abstract:
A processor includes a core with locally-gated circuitry, a decode unit, a local power gate (LPG) coupled to the locally-gated circuitry, and an execution unit. The decode unit includes logic to decode a store broadcast instruction of a specified width. The LPG includes logic to selectively provide power to the locally-gated circuitry, activate power to a first portion of the locally-gated circuitry for execution of full cache-line memory operations, and deactivate power to a second portion of the locally-gated circuitry the locally-gated circuitry. The execution unit includes logic to execute, by the first portion of the locally-gated circuitry for execution of full cache-line memory operations, the store broadcast instruction, the store broadcast instruction to store data of the specified width to storage of the processor.
Abstract:
In some disclosed embodiments instruction execution logic provides conditional memory fault assist suppression. Some embodiments of processors comprise a decode stage to decode one or more instruction specifying: a set of memory operations, one or more register, and one or more memory address. One or more execution units, responsive to the one or more decoded instruction, generate said one or more memory address for the set of memory operations. Instruction execution logic records one or more fault suppress bits to indicate whether one or more portion of the set of memory operations are masked. Fault generation logic is suppressed from considering a memory fault corresponding to a faulting one of the set of memory operations when said faulting one of the set of memory operations corresponds to a portion of the set of memory operations that is indicated as masked by said one or more fault suppress bits.
Abstract:
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
Abstract:
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
Abstract:
A processor includes a core with locally-gated circuitry, a decode unit, a local power gate (LPG) coupled to the locally-gated circuitry, and an execution unit. The decode unit includes logic to decode a store broadcast instruction of a specified width. The LPG includes logic to selectively provide power to the locally-gated circuitry, activate power to a first portion of the locally-gated circuitry for execution of full cache-line memory operations, and deactivate power to a second portion of the locally-gated circuitry the locally-gated circuitry. The execution unit includes logic to execute, by the first portion of the locally-gated circuitry for execution of full cache-line memory operations, the store broadcast instruction, the store broadcast instruction to store data of the specified width to storage of the processor.
Abstract:
Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.