Abstract:
A method of storing register data elements to interleave with data elements of a different register, a processor thereof, and a system thereof, wherein each non-consecutive data elements of a register is retrieved to be stored to interleave with each non-consecutive data elements of a different register upon an executive of an interleaving store instruction, wherein a mask instruction directing a lane of a storage space in which the non-consecutive data elements are stored is executed in conjunction with the interleaving store instruction, and wherein a processor of a second type is configured to emulate a processor of a first type to store the non-consecutive data elements the same as non-consecutive data elements stored in the first type processor.
Abstract:
A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.
Abstract:
An example method includes generating a first interleave instruction based on compilation of a source file configured for execution by the first processor; generating a predication instruction to mask lane(s) of a first source register and a second source register of the second processor, in which the first source register stores a first vector and the second source register stores a second vector, based on translation of the source file; and generating a second interleave instruction based on compilation of the translated source file. The method further includes, based on the predication instruction and the second interleave instruction, reading respective portions of the first and second vectors from unmasked lanes of the first and second source registers, and interleaving the read portions to produce a third vector, which is then stored in a destination register of the second processor.
Abstract:
A method of storing register data elements to interleave with data elements of a different register, a processor thereof, and a system thereof, wherein each non-consecutive data elements of a register is retrieved to be stored to interleave with each non-consecutive data elements of a different register upon an executive of an interleaving store instruction, wherein a mask instruction directing a lane of a storage space in which the non-consecutive data elements are stored is executed in conjunction with the interleaving store instruction, and wherein a processor of a second type is configured to emulate a processor of a first type to store the non-consecutive data elements the same as non-consecutive data elements stored in the first type processor.
Abstract:
A computer system includes a processor and program storage coupled to the processor. The program storage stores a software instruction translator that, when executed by the processor, is configured to receive source code and translate the source code to a low-level language. The source code is restricted to a subset of a high-level language and the low-level language is a specialized instruction set. Each statement of the subset of the high-level language directly maps to an instruction of the low-level language.
Abstract:
A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.
Abstract:
In an example, a system includes a processor, where the processor includes a plurality of processor registers, and where the processor is configured to execute a first instruction in a first execution context. The processor is also configured to receive a PRESERVE instruction that indicates at least one processor register among the plurality of processor registers. The processor is configured to, responsive to the PRESERVE instruction, preserve parameters in the at least one processor register and clear other processor registers in the plurality of processor registers in the first execution context. The processor is also configured to execute a second instruction in a second execution context.
Abstract:
A nested loop controller includes a first register having a first value initialized to an initial first value, a second register having a second value initialized to an initial second value, and a third register configured as a predicate FIFO, initialized to have a third value. The second value is advanced in response to a tick instruction during execution of a loop. In response to the second value reaching a second threshold, the second register is reset to the initial second value. The nested loop controller further includes a comparator coupled to the second register and to the predicate FIFO and configured to provide an outer loop indicator value as input to the predicate FIFO when the second value is equal to the second threshold, and provide an inner loop indicator value as input to the predicate FIFO when the second value is not equal to the second threshold.