-
公开(公告)号:US11966835B2
公开(公告)日:2024-04-23
申请号:US15929093
申请日:2019-01-23
Applicant: NVIDIA Corp.
Inventor: Ching-En Lee , Yakun Shao , Angshuman Parashar , Joel Emer , Stephen W. Keckler
Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.
-
公开(公告)号:US20220076110A1
公开(公告)日:2022-03-10
申请号:US17530852
申请日:2021-11-19
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US20200293867A1
公开(公告)日:2020-09-17
申请号:US16672918
申请日:2019-11-04
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US11270197B2
公开(公告)日:2022-03-08
申请号:US16672918
申请日:2019-11-04
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
-
-