MACHINE LEARNING ARCHITECTURE SUPPORT FOR BLOCK SPARSITY

    公开(公告)号:US20200311181A1

    公开(公告)日:2020-10-01

    申请号:US16370094

    申请日:2019-03-29

    Inventor: Omid Azizi

    Abstract: This disclosure relates matrix operation acceleration for different matrix sparsity patterns. A matrix operation accelerator may be designed to perform matrix operations more efficiently for a first matrix sparsity pattern rather than for a second matrix sparsity pattern. A matrix with the second sparsity pattern may be converted to a matrix with the first sparsity pattern and provided to the matrix operation accelerator. By rearranging the rows and/or columns of the matrix, the sparsity pattern of the matrix may be converted to a sparsity pattern that is suitable for computation with the matrix operation accelerator.

    MACHINE LEARNING ARCHITECTURE SUPPORT FOR BLOCK SPARSITY

    公开(公告)号:US20240330402A1

    公开(公告)日:2024-10-03

    申请号:US18626599

    申请日:2024-04-04

    Inventor: Omid Azizi

    Abstract: This disclosure relates matrix operation acceleration for different matrix sparsity patterns. A matrix operation accelerator may be designed to perform matrix operations more efficiently for a first matrix sparsity pattern rather than for a second matrix sparsity pattern. A matrix with the second sparsity pattern may be converted to a matrix with the first sparsity pattern and provided to the matrix operation accelerator. By rearranging the rows and/or columns of the matrix, the sparsity pattern of the matrix may be converted to a sparsity pattern that is suitable for computation with the matrix operation accelerator.

    Apparatus and method for a masked multiply instruction to support neural network pruning operations

    公开(公告)号:US10929503B2

    公开(公告)日:2021-02-23

    申请号:US16230814

    申请日:2018-12-21

    Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

    Machine learning architecture support for block sparsity

    公开(公告)号:US11977600B2

    公开(公告)日:2024-05-07

    申请号:US17481064

    申请日:2021-09-21

    Inventor: Omid Azizi

    Abstract: This disclosure relates matrix operation acceleration for different matrix sparsity patterns. A matrix operation accelerator may be designed to perform matrix operations more efficiently for a first matrix sparsity pattern rather than for a second matrix sparsity pattern. A matrix with the second sparsity pattern may be converted to a matrix with the first sparsity pattern and provided to the matrix operation accelerator. By rearranging the rows and/or columns of the matrix, the sparsity pattern of the matrix may be converted to a sparsity pattern that is suitable for computation with the matrix operation accelerator.

    RESERVATION ARCHITECTURE FOR OVERCOMMITTED MEMORY

    公开(公告)号:US20210240609A1

    公开(公告)日:2021-08-05

    申请号:US17119679

    申请日:2020-12-11

    Abstract: Various systems and methods for computer memory overcommitment management are described herein. A system for computer memory management includes a memory device to store data and a mapping table; and a memory overcommitment circuitry to: receive a signal to move data in a first block from a memory reduction area in the memory device to a non-memory reduction area in the memory device, the memory reduction area to store data using a memory reduction technique, and the non-memory reduction area to store data without any memory reduction techniques; allocate a second block in the non-memory reduction area; copy the data in the first block to the second block; and update the mapping table to revise a pointer to point to the second block, the mapping table used to store pointers to memory device in the memory reduction area and the non-memory reduction area.

Patent Agency Ranking