-
公开(公告)号:GB2582232A
公开(公告)日:2020-09-16
申请号:GB202009717
申请日:2018-11-30
Applicant: IBM
Inventor: CHIA-YU CHEN , ANKUR AGRAWAL , DANIEL BRAND , KAILASH GOPALAKRISHNAN , JUNGWOOK CHOI
Abstract: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.
-
公开(公告)号:GB2600871A
公开(公告)日:2022-05-11
申请号:GB202201893
申请日:2020-08-17
Applicant: IBM
Inventor: XIAO SUN , JUNGWOOK CHOI , NAIGANG WANG , CHIA-YU CHEN , KAILASH GOPALAKRISHNAN
IPC: G06N3/08
Abstract: An apparatus for training and inferencing a neural network includes circuitry that is configured to generate a first weight having a first format including a first number of bits based at least in part on a second weight having a second format including a second number of bits and a residual having a third format including a third number of bits. The second number of bits and the third number of bits are each less than the first number of bits. The circuitry is further configured to update the second weight based at least in part on the first weight and to update the residual based at least in part on the updated second weight and the first weight. The circuitry is further configured to update the first weight based at least in part on the updated second weight and the updated residual.
-
公开(公告)号:GB2582233A
公开(公告)日:2020-09-16
申请号:GB202009750
申请日:2018-11-30
Applicant: IBM
Inventor: JUNGWOOK CHOI , PRITISH NARAYANAN , CHIA-YU CHEN , KAILASH GOPALAKRISHNAN , SUYOG GUPTA
IPC: G06N3/08
Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components. A cropping component crops high-frequency weights of the respective transformed normalized sub-components to yield a set of low-frequency normalized sub-components to generate a compressed representation of the original sub-components.
-
公开(公告)号:GB2549685A
公开(公告)日:2017-10-25
申请号:GB201713201
申请日:2016-02-17
Applicant: IBM
Inventor: ZUOGUANG LIU , CHIA-YU CHEN , TENKO YAMASHITA , MIAOMIAO WANG
IPC: H01L21/8238 , H01L27/092
Abstract: A technique for forming a semiconductor device is provided. Sacrificial mandrels are formed over a hardmask layer on a semiconductor layer. Spacers are formed on sidewalls of the sacrificial mandrels. The sacrificial mandrels are removed to leave the spacers. A masking process leaves exposed a first set of spacers with a second set protected. In response to the masking process, a first fin etch process forms a first set of fins in the semiconductor layer via first set of spacers. The first set of fins has a vertical sidewall profile. Another masking process leaves exposed the second set of spacers with the first set of spacers and the first set of fins protected. In response to the other masking process, a second fin etch process forms a second set of fins in semiconductor layer using the second set of spacers. The second set of fins has a trapezoidal sidewall profile.
-
-
-