Systems and methods for generation of sparse code for convolutional neural networks
Abstract:
A system and method may generate code to be used when executing neural networks (NNs), for example convolutional neural networks (CNNs) which may include one or more convolutional layers. For at least one convolutional layer, for each non-zero element in a kernel tensor or matrix associated with the convolutional layer, instructions may be generated or issued. For example, for each non-zero element, a vector broadcast instruction may be generated, and a fused multiply-add (FMA) instruction may be generated, having as parameters a register representing a portion of the output for the convolutional layer, a register storing input data for the convolutional layer, and a register or reference to memory storing the non-zero element. The software or code produced may be executed during convolutional operations, for example as part of a larger application such as a NN inference application.
Information query
Patent Agency Ranking
0/0