Abstract:
A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand)) + (b*(second vector operand)) + c.
Abstract:
A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section. Optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.
Abstract:
In einer Ausführungsform ist eine fusionierte Multiplizier-Addier-(FMA)-Schaltung konfiguriert, um eine Mehrzahl von Eingangsdatenwerten zu empfangen, um einen FMA-Befehl auf die Eingangsdatenwerte auszuführen. Die Schaltung umfasst eine Multiplizier-Einheit und eine Addier-Einheit, die mit einem Ausgang der Multiplizier-Einheit gekoppelt ist, und eine Steuerungslogik, um die Eingangsdatenwerte zu empfangen, und um eine Schaltaktivität zu reduzieren und somit den Stromverbrauch eines oder mehrerer Komponenten der Schaltung basierend auf einem Wert eines oder mehrerer der Eingangsdatenwerte zu reduzieren. Andere Ausführungsformen werden beschrieben und beansprucht.
Abstract:
A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section. Optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.
Abstract:
Es sind Ausführungsformen von Systemen, Verfahren und Vorrichtungen für heterogene Berechnung beschrieben. In manchen Ausführungsformen versendet ein Hardware-heterogener Planer Anweisungen zur Ausführung auf einem oder mehreren einer Vielzahl von heterogenen Verarbeitungselementen, wobei die Anweisungen einem Codefragment entsprechen, das durch das eine oder die mehreren der Vielzahl von heterogenen Verarbeitungselementen zu verarbeiten ist, wobei die Anweisungen native Anweisungen an zumindest einer des einen oder der mehreren der Vielzahl von heterogenen Verarbeitungselementen sind.
Abstract:
A logic structure adapted to receive pulsed active input signals produces a logical output with a very small inherent switching delay. Pull-down transistors and complementary pull-up transistors are ratioed such that the default logical output level remains close to nominal even when the logic structure sinks or sources a DC current. When the pulsed input signals are inactive, no DC current path is enabled.
Abstract:
A pulse generating circuit includes a first pulse generating circuit for generating a first output pulse, and a second pulse generating circuit for outputting a second output pulse. Each pulse generating circuit comprises a stack of two n-channel transistors and a reset circuit. The reset circuit includes two p-channel transistors and two inverters and is provided for automatically resetting the pulse generating circuits. The second pulse generating circuit includes a delay element for introducing an additional gate delay in the generation of the second output pulse. The additional gate delay introduces an asymmetry in the output pulses which offsets or cancels a previously introduced asymmetry of an input clock signal to generate an output clock signal having a constant period. Clock gating circuitry is provided for selectively enabling and disabling at least one of said pulse generator circuits.
Abstract:
A processor and method comprises a fused multiply-add (FMA) circuit which includes a multiplier unit 110 and an adder unit 125 to compute a fused multiply add operation. If certain input data values are received at the FMA circuit, an exception occurs and components of the circuit will be clock gated to disable them and prevent them from toggling. If either of the two inputs to the multiplier unit are zero, then the multiplier and adder are gated and the addend provided as output 135. If one of the multiplier units is equal to one, then the multiplier is gated and the other multiplier input is directly provided to the adder. If the addend is zero, then the adder is gated and the product of the multiplier provided as output. If one of the multiplier inputs is equal to 2N, then the multiplier is gated and the other multiplier input directed to a left or right shifter 114. This processor might form part of multi-core processing system, and it is implemented to save power by bypassing the arithmetic units when they are not required.