-
公开(公告)号:US11966835B2
公开(公告)日:2024-04-23
申请号:US15929093
申请日:2019-01-23
Applicant: NVIDIA Corp.
Inventor: Ching-En Lee , Yakun Shao , Angshuman Parashar , Joel Emer , Stephen W. Keckler
Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.
-
公开(公告)号:US20200082246A1
公开(公告)日:2020-03-12
申请号:US16517431
申请日:2019-07-19
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Nan Jiang , Brian Matthew Zimmer , Jason Clemons , Nathaniel Pinckney , Matthew R. Fojtik , William James Dally , Joel S. Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
-
公开(公告)号:US11270197B2
公开(公告)日:2022-03-08
申请号:US16672918
申请日:2019-11-04
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US11769040B2
公开(公告)日:2023-09-26
申请号:US16517431
申请日:2019-07-19
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Nan Jiang , Brian Matthew Zimmer , Jason Clemons , Nathaniel Pinckney , Matthew R Fojtik , William James Dally , Joel S. Emer , Stephen W. Keckler , Brucek Khailany
CPC classification number: G06N3/049 , G06F9/44505 , G06F9/544 , G06N3/082
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
-
公开(公告)号:US20220076110A1
公开(公告)日:2022-03-10
申请号:US17530852
申请日:2021-11-19
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US20200293867A1
公开(公告)日:2020-09-17
申请号:US16672918
申请日:2019-11-04
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
-
-
-
-