-
公开(公告)号:US11126439B2
公开(公告)日:2021-09-21
申请号:US16686060
申请日:2019-11-15
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US20210109761A1
公开(公告)日:2021-04-15
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
公开(公告)号:US10481869B1
公开(公告)日:2019-11-19
申请号:US15809648
申请日:2017-11-10
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ting Yu , Yu Sun
Abstract: Techniques are disclosed relating to circuitry configured to perform floating-point operations such as fused multiply-addition (FMA) with multiple paths and power control. In some embodiments, an FMA unit includes a near path and multiple far paths and is configured to select a path based on a determined exponent difference. In some embodiments, the FMA unit is configured to operate portions of non-selected paths in a low power state.
-
公开(公告)号:US11645084B2
公开(公告)日:2023-05-09
申请号:US17470682
申请日:2021-09-09
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
CPC classification number: G06F9/3887 , G06F9/30098 , G06T1/20 , G06T1/60
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US20210382687A1
公开(公告)日:2021-12-09
申请号:US16893051
申请日:2020-06-04
Applicant: Apple Inc.
Inventor: Anthony Y. Tai , Liang-Kai Wang , Ian R. Ollmann , Anand Poovekurussi
IPC: G06F7/483 , G06F7/499 , G06F1/3206 , G06F9/30 , G06F9/38
Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.
-
公开(公告)号:US10387119B2
公开(公告)日:2019-08-20
申请号:US16146147
申请日:2018-09-28
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Brian K. Reynolds , Justin Friesenhahn
Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads. In some embodiments, the circuitry is configured to generate a result for the at least one of the multiple threads by selectively performing the arithmetic operation or using the input value that is common to the multiple threads based on the type value.
-
公开(公告)号:US20170357506A1
公开(公告)日:2017-12-14
申请号:US15180725
申请日:2016-06-13
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Andrew M. Havlir
IPC: G06F9/30
CPC classification number: G06F9/30021 , G06F9/3001 , G06F9/30083
Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
-
公开(公告)号:US20170293470A1
公开(公告)日:2017-10-12
申请号:US15092401
申请日:2016-04-06
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Andrew M. Havlir , Yu Sun , Nicolas X. Pena , Xiao-Long Wu , Christopher A. Burns
IPC: G06F7/483
CPC classification number: G06F7/483 , G06F7/5443
Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.
-
公开(公告)号:US09785567B2
公开(公告)日:2017-10-10
申请号:US14851859
申请日:2015-09-11
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Terence M. Potter , Liang-Kai Wang
IPC: G06F12/00 , G06F13/00 , G06F13/28 , G06F12/0884 , G06F12/0846 , G06F9/38 , G06F9/30
CPC classification number: G06F12/0884 , G06F9/3012 , G06F9/3824 , G06F9/383 , G06F9/3834 , G06F9/3838 , G06F9/3859 , G06F12/0848 , G06F2212/604
Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.
-
公开(公告)号:US20170244391A1
公开(公告)日:2017-08-24
申请号:US15046926
申请日:2016-02-18
Applicant: Apple Inc.
Inventor: James Wang , Benjiman L. Goodman , Liang-Kai Wang , Robert D. Kenney
CPC classification number: H03K5/00006 , G06F1/3228 , G06F1/324 , G06F1/329 , Y02D10/126 , Y02D10/24 , Y02D50/20
Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N−1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.
-
-
-
-
-
-
-
-
-