Computation Engine with Upsize/Interleave and Downsize/Deinterleave Options

    公开(公告)号:US20190310854A1

    公开(公告)日:2019-10-10

    申请号:US15946719

    申请日:2018-04-05

    Applicant: Apple Inc.

    Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to to provide a compact result in a desired order.

    Software updating
    23.
    发明授权

    公开(公告)号:US09792109B2

    公开(公告)日:2017-10-17

    申请号:US14941269

    申请日:2015-11-13

    Applicant: Apple Inc.

    CPC classification number: G06F8/71 G06F8/65 G06F8/658

    Abstract: A novel method for updating a bundle of files from an update package that minimize the free space requirement on disk is provided. The method segments the update of the entire package and performs the update in multiple passes. The method divide the archive payload of the entire update package into pieces and expand one piece of the archive in each pass. At the end of each pass, some embodiments remove from the disk the archive piece expanded in that pass in order to free additional space for the next pass.

    Instruction Support for Matrix Multiplication

    公开(公告)号:US20240103858A1

    公开(公告)日:2024-03-28

    申请号:US18045928

    申请日:2022-10-12

    Applicant: Apple Inc.

    CPC classification number: G06F9/3001 G06F9/3013 G06F17/16

    Abstract: Techniques are disclosed relating to instruction set architecture support for matrix manipulations. In disclosed embodiments, front-end circuitry is configured to fetch and decode a matrix multiply instruction for execution, including to encode a given matrix input operand of the matrix multiply instruction to identify one or more vector registers defined according to an instruction set architecture. In some embodiments, datapath circuitry is configured to execute the matrix multiply instruction, where during execution of the instruction, the one or more vector registers corresponding to the given matrix operand are mapped within the datapath circuitry to at least two dimensions of the given matrix operand. In some embodiments, power management circuitry is configured to, during execution of the instruction, operate at least a portion of the front-end circuitry in a reduced-power mode. Disclosed techniques may advantageously increase throughput and reduce power consumption, relative to traditional implementations using vector operations.

    Execution Circuitry for Floating-Point Power Operation

    公开(公告)号:US20240094989A1

    公开(公告)日:2024-03-21

    申请号:US18045577

    申请日:2022-10-11

    Applicant: Apple Inc.

    CPC classification number: G06F7/4876

    Abstract: Techniques are disclosed relating to dedicated power function circuitry for a floating-point power instruction. In some embodiments, execution circuitry is configured to execute a floating-point power instruction to evaluate the power function xy as 2y log2x. In some embodiments, base-2 logarithm circuitry is configured to evaluate a base-2 logarithm for a first input (e.g., log2 x) by determining coefficients for a polynomial function and evaluating the polynomial function using the determined coefficients and the first input. In some embodiments, multiplication circuitry multiplies the base-2 logarithm result by a second input to generate a multiplication result. In some embodiments, base-2 power function circuitry is configured to evaluate a base-2 power function for the multiplication result. Disclosed techniques may advantageously increase performance and reduce power consumption of floating-point power function operations with reasonable area and accuracy, relative to traditional techniques.

    Compression assist instructions
    26.
    发明授权

    公开(公告)号:US11822921B2

    公开(公告)日:2023-11-21

    申请号:US18054017

    申请日:2022-11-09

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.

    Lossless Compression Techniques
    27.
    发明申请

    公开(公告)号:US20230078235A1

    公开(公告)日:2023-03-16

    申请号:US17588114

    申请日:2022-01-28

    Applicant: Apple Inc.

    Abstract: Compression techniques are described. In an embodiment, a first plane of sensor data is accessed, the first plane of sensor data is divided into a plurality of slices, each sample is encoded in each slice from the plurality of slices, where encoding a sample include computing a median based prediction for the sample, computing an error for the sample comprising a difference between the sample and the computed median based prediction, determining a context for the sample, selecting a model for the sample by using the determined context, and encoding the computed error by using the selected model.

    Compression assist instructions
    28.
    发明授权

    公开(公告)号:US11086625B2

    公开(公告)日:2021-08-10

    申请号:US16566344

    申请日:2019-09-10

    Applicant: Apple Inc.

    Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.

    Computation engine with strided dot product

    公开(公告)号:US10990401B2

    公开(公告)日:2021-04-27

    申请号:US16837631

    申请日:2020-04-01

    Applicant: Apple Inc.

    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

    Systems and methods for performing memory compression

    公开(公告)号:US10769065B2

    公开(公告)日:2020-09-08

    申请号:US16436635

    申请日:2019-06-10

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for efficiently moving data for storage and processing a compression unit within a processor includes multiple hardware lanes, selects two or more input words to compress, and for assigns them to two or more of the multiple hardware lanes. As each assigned input word is processed, each word is compared to an entry of a plurality of entries of a table. If it is determined that each of the assigned input words indexes the same entry of the table, the hardware lane with the oldest input word generates a single read request for the table entry and the hardware lane with the youngest input word generates a single write request for updating the table entry upon completing compression. Each hardware lane generates a compressed packet based on its assigned input word.

Patent Agency Ranking