LOSS-ERROR-AWARE QUANTIZATION OF A LOW-BIT NEURAL NETWORK

    公开(公告)号:US20250117639A1

    公开(公告)日:2025-04-10

    申请号:US18886625

    申请日:2024-09-16

    Abstract: Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss. The example apparatus includes a weight updater to update the second group of network weights based on the difference. The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights.

    DYNAMIC NEURAL NETWORK SURGERY
    3.
    发明申请

    公开(公告)号:US20250045582A1

    公开(公告)日:2025-02-06

    申请号:US18804720

    申请日:2024-08-14

    Abstract: Techniques related to compressing a pre-trained dense deep neural network to a sparsely connected deep neural network for efficient implementation are discussed. Such techniques may include iteratively pruning and splicing available connections between adjacent layers of the deep neural network and updating weights corresponding to both currently disconnected and currently connected connections between the adjacent layers.

    SAMPLE-ADAPTIVE CROSS-LAYER NORM CALIBRATION AND RELAY NEURAL NETWORK

    公开(公告)号:US20240296668A1

    公开(公告)日:2024-09-05

    申请号:US18572510

    申请日:2021-09-10

    CPC classification number: G06V10/82 G06V10/955

    Abstract: Technology to conduct image sequence/video analysis can include a processor, and a memory coupled to the processor, the memory storing a neural network, the neural network comprising a plurality of convolution layers, and a plurality of normalization layers arranged as a relay structure, wherein each normalization layer is coupled to and following a respective one of the plurality of convolution layers. The plurality of normalization layers can be arranged as a relay structure where a normalization layer for a layer (k) is coupled to and following a normalization layer for a preceding layer (k−1). The normalization layer for the layer (k) is coupled to the normalization layer for the preceding layer (k−1) via a hidden state signal and a cell state signal, each signal generated by the normalization layer for the preceding layer (k−1). Each normalization layer (k) can include a meta-gating unit (MGU) structure.

Patent Agency Ranking