ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

    公开(公告)号:US20240005135A1

    公开(公告)日:2024-01-04

    申请号:US18135958

    申请日:2023-04-18

    CPC classification number: G06N3/063 G06N3/045 G06N3/088

    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

    Technique for monitoring activity within an integrated circuit
    2.
    发明授权
    Technique for monitoring activity within an integrated circuit 有权
    集成电路内监控活动的技术

    公开(公告)号:US09189302B2

    公开(公告)日:2015-11-17

    申请号:US14167688

    申请日:2014-01-29

    Inventor: Lance Hacking

    Abstract: A technique to monitor events within a computer system or integrated circuit. In one embodiment, a software-accessible event monitoring storage and hardware-specific monitoring logic are selectable and their corresponding outputs may be monitored by accessing a counter to count events corresponding to each of software-accessible storage and hardware-specific monitoring logic.

    Abstract translation: 监控计算机系统或集成电路中的事件的技术。 在一个实施例中,可以选择软件可访问事件监视存储和硬件特定的监视逻辑,并且可以通过访问计数器来监视其对应的输出,以计数与软件可访问的存储和硬件特定的监视逻辑中的每一个对应的事件。

    ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

    公开(公告)号:US20200320375A1

    公开(公告)日:2020-10-08

    申请号:US16909295

    申请日:2020-06-23

    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

    Schedule-aware tensor distribution module

    公开(公告)号:US12288153B2

    公开(公告)日:2025-04-29

    申请号:US18408716

    申请日:2024-01-10

    Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.

    Accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits

    公开(公告)号:US11714998B2

    公开(公告)日:2023-08-01

    申请号:US16909295

    申请日:2020-06-23

    CPC classification number: G06N3/063 G06N3/0454 G06N3/088

    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

    ESTIMATION OF POWER PROFILES FOR NEURAL NETWORK MODELS RUNNING ON AI ACCELERATORS

    公开(公告)号:US20230004430A1

    公开(公告)日:2023-01-05

    申请号:US17856968

    申请日:2022-07-02

    Abstract: Technology for estimating neural network (NN) power profiles includes obtaining a plurality of workloads for a compiled NN model, the plurality of workloads determined for a hardware execution device, determining a hardware efficiency factor for the compiled NN model, and generating, based on the hardware efficiency factor, a power profile for the compiled NN model on one or more of a per-layer basis or a per-workload basis. The hardware efficiency factor can be determined on based on a hardware efficiency measurement and a hardware utilization measurement, and can be determined on a per-workload basis. A configuration file can be provided for generating the power profile, and an output visualization of the power profile can be generated. Further, feedback information can be generated to perform one or more of selecting a hardware device, optimizing a breakdown of workloads, optimizing a scheduling of tasks, or confirming a hardware device design.

Patent Agency Ranking