Systems and methods for assigning tasks in a neural network processor

    公开(公告)号:US12282838B2

    公开(公告)日:2025-04-22

    申请号:US15971872

    申请日:2018-05-04

    Applicant: Apple Inc.

    Abstract: Embodiments relate to managing tasks that when executed by a neural processor circuit instantiates a neural network. The neural processor circuit includes neural engine circuits and a neural task manager circuit. The neural task manager circuit includes multiple task queues and a task arbiter circuit. Each task queue stores a reference to a task list of tasks for a machine learning operation. Each task queue may be associated with a priority parameter. Based on the priority of the task queues, the task arbiter circuit retrieves configuration data for a task from a memory external to the neural processor circuit, and provides the configuration data to components of the neural processor circuit including the neural engine circuits. The configuration data programs the neural processor circuit to execute the task. For example, the configuration data may include input data and kernel data processed by the neural engine circuits to execute the task.

    Filtering of keypoint descriptors based on orientation angle

    公开(公告)号:US12169959B2

    公开(公告)日:2024-12-17

    申请号:US17693007

    申请日:2022-03-11

    Applicant: Apple Inc.

    Abstract: Embodiments of the present disclosure relate to selecting a subset of keypoint descriptors of two images for match operation based on their orientation angles indicated in headers of the keypoint descriptors. The keypoint descriptors in the two images are matched by first comparing their headers and then performing vector distance determination. During the header comparison operation, a header of a descriptor of a first image is compared only with headers of keypoint descriptors of a second image in a discrete orientation angle range corresponding to an orientation angle indicated by the header of the first image descriptor or keypoint descriptors of the second image in adjacent discrete orientation angle ranges. After the headers of the keypoint descriptors satisfying one or more matching criteria are determined, distance determination operations are performed between the keypoint descriptors while the remaining keypoint descriptors are discarded without determining their distances.

    Memory fetch granule
    14.
    发明授权

    公开(公告)号:US11467988B1

    公开(公告)日:2022-10-11

    申请号:US17230490

    申请日:2021-04-14

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for implementing a memory fetch granule for real-time agents are described. A computing system includes a plurality of real-time agents coupled to memory via an interconnect fabric and a memory controller. The efficiency of the memory controller is determined by the number of bank groups in the memory devices coupled to the memory controller. A memory fetch granule is defined for the memory controller based on the amount of data that can be accessed in parallel on the memory device in back-to-back access cycles. Each real-time agent accumulates memory requests for sequential physical addresses until the amount of data referenced by the requests reaches the size of the memory fetch granule. Once the memory fetch granule is reached, the real-time agent sends the requests to the memory controller via the fabric. This helps to ensure that the requests will arrive at the memory controller near enough to each other to get grouped together.

    Compression of kernel data for neural network operations

    公开(公告)号:US11120327B2

    公开(公告)日:2021-09-14

    申请号:US15971657

    申请日:2018-05-04

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.

    COMPRESSION OF KERNEL DATA FOR NEURAL NETWORK OPERATIONS

    公开(公告)号:US20190340488A1

    公开(公告)日:2019-11-07

    申请号:US15971657

    申请日:2018-05-04

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.

    Latency Events in Multi-Die Architecture
    18.
    发明公开

    公开(公告)号:US20240184355A1

    公开(公告)日:2024-06-06

    申请号:US18438665

    申请日:2024-02-12

    Applicant: Apple Inc.

    CPC classification number: G06F1/3296 G06F1/3206

    Abstract: Techniques are disclosed that pertain to synchronizing power states between integrated circuit dies. A system includes an integrated circuit that includes a plurality of integrated circuit dies coupled together. A particular integrated circuit die may include a primary power manager circuit and one or more remaining integrated circuit dies include respective secondary power manager circuits. The primary power manager circuit is configured to issue a transition request to the secondary power manager circuits to transition their integrated circuit dies from a first power state to a second power state. A given secondary power manager circuit is configured to receive the transition request, transition its integrated circuit die to the second power state, and issue an acknowledgement to the primary power manager circuit that its integrated circuit die has been transitioned to the second power state. Techniques are further disclosed relating to managing latency tolerance events within a multi-die integrated circuit.

    Multi-Die Power Synchronization
    19.
    发明申请

    公开(公告)号:US20230059725A1

    公开(公告)日:2023-02-23

    申请号:US17933168

    申请日:2022-09-19

    Applicant: Apple Inc.

    Abstract: Techniques are disclosed that pertain to synchronizing power states between integrated circuit dies. A system includes an integrated circuit that includes a plurality of integrated circuit dies coupled together. A particular integrated circuit die may include a primary power manager circuit and one or more remaining integrated circuit dies include respective secondary power manager circuits. The primary power manager circuit is configured to issue a transition request to the secondary power manager circuits to transition their integrated circuit dies from a first power state to a second power state. A given secondary power manager circuit is configured to receive the transition request, transition its integrated circuit die to the second power state, and issue an acknowledgement to the primary power manager circuit that its integrated circuit die has been transitioned to the second power state. Techniques are further disclosed relating to managing latency tolerance events within a multi-die integrated circuit.

    Memory Fetch Granule
    20.
    发明申请

    公开(公告)号:US20220334984A1

    公开(公告)日:2022-10-20

    申请号:US17230490

    申请日:2021-04-14

    Applicant: Apple Inc.

    Abstract: Systems, apparatuses, and methods for implementing a memory fetch granule for real-time agents are described. A computing system includes a plurality of real-time agents coupled to memory via an interconnect fabric and a memory controller. The efficiency of the memory controller is determined by the number of bank groups in the memory devices coupled to the memory controller. A memory fetch granule is defined for the memory controller based on the amount of data that can be accessed in parallel on the memory device in back-to-back access cycles. Each real-time agent accumulates memory requests for sequential physical addresses until the amount of data referenced by the requests reaches the size of the memory fetch granule. Once the memory fetch granule is reached, the real-time agent sends the requests to the memory controller via the fabric. This helps to ensure that the requests will arrive at the memory controller near enough to each other to get grouped together.

Patent Agency Ranking