-
公开(公告)号:US20220067513A1
公开(公告)日:2022-03-03
申请号:US17112795
申请日:2020-12-04
Applicant: NVIDIA Corp.
Inventor: Jacob Robert Stevens , Rangharajan Venkatesan , Steve Haihang Dai , Brucek Khailany
Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.