-
公开(公告)号:US20200151288A1
公开(公告)日:2020-05-14
申请号:US16520688
申请日:2019-07-24
Applicant: NVIDIA Corp.
Inventor: Yuzhe Ma , Haoxing Ren , Brucek Khailany , Harbinder Sikka , Lijuan Luo , Karthikeyan Natarajan
IPC: G06F17/50
Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.
-
公开(公告)号:US20220067513A1
公开(公告)日:2022-03-03
申请号:US17112795
申请日:2020-12-04
Applicant: NVIDIA Corp.
Inventor: Jacob Robert Stevens , Rangharajan Venkatesan , Steve Haihang Dai , Brucek Khailany
Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.
-
公开(公告)号:US11769040B2
公开(公告)日:2023-09-26
申请号:US16517431
申请日:2019-07-19
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Nan Jiang , Brian Matthew Zimmer , Jason Clemons , Nathaniel Pinckney , Matthew R Fojtik , William James Dally , Joel S. Emer , Stephen W. Keckler , Brucek Khailany
CPC classification number: G06N3/049 , G06F9/44505 , G06F9/544 , G06N3/082
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
-
公开(公告)号:US20220076110A1
公开(公告)日:2022-03-10
申请号:US17530852
申请日:2021-11-19
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US20200293867A1
公开(公告)日:2020-09-17
申请号:US16672918
申请日:2019-11-04
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US11270197B2
公开(公告)日:2022-03-08
申请号:US16672918
申请日:2019-11-04
Applicant: NVIDIA Corp.
Inventor: Yakun Shao , Rangharajan Venkatesan , Miaorong Wang , Daniel Smith , William James Dally , Joel Emer , Stephen W. Keckler , Brucek Khailany
Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
-
公开(公告)号:US10657306B1
公开(公告)日:2020-05-19
申请号:US16520688
申请日:2019-07-24
Applicant: NVIDIA Corp.
Inventor: Yuzhe Ma , Haoxing Ren , Brucek Khailany , Harbinder Sikka , Lijuan Luo , Karthikeyan Natarajan
IPC: G06F17/50 , G06F30/327 , G06F30/367 , G06F30/3323 , G06F30/323 , G06F30/398 , G06F30/3308
Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.
-
公开(公告)号:US11645533B2
公开(公告)日:2023-05-09
申请号:US15929242
申请日:2020-03-17
Applicant: NVIDIA Corp.
Inventor: Zhiyao Xie , Haoxing Ren , Brucek Khailany , Sheng Ye
IPC: G06N3/084 , G06F30/398 , G06N3/04 , G06F119/06
CPC classification number: G06N3/084 , G06F30/398 , G06N3/04 , G06F2119/06
Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.
-
公开(公告)号:US20210158155A1
公开(公告)日:2021-05-27
申请号:US16992354
申请日:2020-08-13
Applicant: NVIDIA Corp.
Inventor: Yanqing Zhang , Haoxing Ren , Brucek Khailany
Abstract: A graph neural network for average power estimation of netlists is trained with register toggle rates over a power window from an RTL simulation and gate level netlists as input features. Combinational gate toggle rates are applied as labels. The trained graph neural network is then applied to infer combinational gate toggle rates over a different power window of interest and/or different netlist.
-
公开(公告)号:US20200327417A1
公开(公告)日:2020-10-15
申请号:US15929242
申请日:2020-03-17
Applicant: NVIDIA Corp.
Inventor: Zhiyao Xie , Haoxing Ren , Brucek Khailany , Sheng Ye
IPC: G06N3/08 , G06N3/04 , G06F30/398
Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.
-
-
-
-
-
-
-
-
-