Abstract:
A programmable processor and method for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying both a shift amount and a register containing a plurality of data elements, wherein the execution unit is operable to shift a subfield of each of the plurality of data elements by the shift amount to produce a second plurality of data elements; and provide the second plurality of data elements as a catenated result.
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path with of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations is disclosed.
Abstract:
A programmable processor and method for improving the performance of processors by incorporating an execution unit configurable to execute a plurality of instruction streams from the plurality of threads, wherein each instruction stream includes a group instruction that operates on a plurality of data elements in partitioned fields of at least one of the registers to produce a catenated result.
Abstract:
A system and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying both a mask and a register containing data, the mask comprising fields that each correspond to a field of the data contained in the register, the execution unit is operable to detect some of the fields of the mask as having a predetermined value and identifying corresponding fields of the data contained in the register as write-enabled data fields; and cause the write-enabled data fields to be written to a specified memory location.
Abstract:
A method and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying three registers each containing a plurality of data elements, the execution unit operable to multiply the first and second registers and add the third register to produce a catenated result containing a plurality of data elements. Additional instructions provide group floating-point subtract, add, multiply, set less, and set greater equal operations. The set less and set greater equal operations produce alternatively zero or an identity element for each element of a catenated result, the result facilitating alternative selection of individual data elements using bitwise Boolean operations and without requiring conditional branch operations.
Abstract:
A system and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying a data selection operand and a first and a second register providing a plurality of data elements, the data selection operand comprising a plurality of fields each selecting one of the plurality of data elements, the execution unit operable to provide the data element selected by each field of the data selection operand to a predetermined position in a catenated result.
Abstract:
A digital FM demodulator and method for determining phase changes in highly oversampled complex FM digital signals is described. In a first embodiment the FM signal is oversampled with respect to the frequency of its associated modulating signal. In this embodiment a first digital processing stage delays and conjugates the original FM signal. This delayed conjugated original FM signal is then multiplied with the original FM signal to generate a second signal that represents the changes in the phase between samples of the original FM signal. A second processing stage then delays and conjugates the second signal. The delayed conjugated second signal is then multiplied with the original second signal to generate a third signal that represents changes in the phase between samples of the second signal. The imaginary component of the third signal is passed through a digital integrator which outputs the phase changes of the original FM signal. In a second embodiment, the highly oversampled signal is oversampled with respect to the deviation frequency of its associated modulating signal. In this embodiment the center frequency of the original FM signal is frequency shifted to approximately zero frequency. This frequency shifted signal is then delayed and conjugated. The delayed conjugated shifted signal is then multiplied with the original frequency shifted signal; yielding an output signal where the imaginary portion of the output signal is equal to the phase changes of the original FM signal.
Abstract:
The present invention is an improvement of a digital topology including a logic block portion and a buffer portion. The improved buffer portion of the present invention is implemented with first and second parallel, same conductivity type transmission gates. The transmission gates couple either a first (V1) or second (V2) voltage onto the output of the buffer (55) in response to a logic signal originating from the logic block portion. The first (V1) and second (V2) voltages are selected to be relatively close in magnitude such that the peak-to-peak voltage of the digital output signal seen on the output of the buffer is relatively small. As a result, power consumption for charging the output of the buffer is minimized. In addition, the parallel transmission gates only consume power while charging the output of the buffer so that quiescent power consumption of the buffer is eliminated. Quiescent power dissipation is also eliminated in certain types of logic block designs that include logic gates having constant current sources. This is achieved by enabling the current sources with a pulse signal. The pulse width and magnitude of the pulse signal is selected to allow a latched sense amplifier to sense valid data from the output of the logic block portion during a specified interval. After valid data is sensed, the logic blocks's current sources are disabled, and the logic block portion no longer consumes any power. The sense amplifier is enabled for intervals long enough to capture the data from the logic block and drive the transmission gates with the data. In this configuration, none of the elements in the topology dissipate quiescent power since none of them are constantly operating.
Abstract:
An improvement for reducing proximity effects comprised of additional lines, referred to as intensity leveling bars, into the mask pattern. The leveling bars perform the function of adjusting the edge intensity gradients of isolated edges in the mask pattern, to match the edge intensity gradients of densely packed edges. Leveling bars are placed parallel to isolated edges such that intensity gradient leveling occurs on all isolated edges of the mask pattern. In addition, the leveling bars are designed to have a width significantly less than the resolution of the exposure tool. Therefore, leveling bars that are present in the mask pattern produce resist patterns that completely developed away when a nominal exposure energy is utilized during exposure of photoresist.