Deep learning model training system

    公开(公告)号:US10949747B1

    公开(公告)日:2021-03-16

    申请号:US16950145

    申请日:2020-11-17

    Abstract: A computer trains a neural network model. (A) Observation vectors are randomly selected from a plurality of observation vectors. (B) A forward and backward propagation of a neural network is executed to compute a gradient vector and a weight vector. (C) A search direction vector is computed. (D) A step size value is computed. (E) An updated weight vector is computed. (F) Based on a predefined progress check frequency value, second observation vectors are randomly selected, a progress check objective function value is computed given the weight vector, the step size value, the search direction vector, and the second observation vectors, and based on an accuracy test, the mini-batch size value is updated. (G) (A) to (F) are repeated until a convergence parameter value indicates training of the neural network is complete. The weight vector for a next iteration is the computed updated weight vector.

    Cubic regularization optimizer
    2.
    发明授权

    公开(公告)号:US11983631B1

    公开(公告)日:2024-05-14

    申请号:US18511092

    申请日:2023-11-16

    CPC classification number: G06N3/08 G06F17/16

    Abstract: A computer determines a solution to a nonlinear optimization problem. A conjugate gradient (CG) iteration is performed with a first order derivative vector and a second order derivative matrix to update a CG residual vector, an H-conjugate vector, and a residual weight vector. A CG solution vector is updated using a previous CG solution vector, the H-conjugate vector, and the residual weight vector. An eigenvector of the second order derivative matrix having a smallest eigenvalue is computed. A basis matrix is defined that includes a cubic regularization (CR) solution vector, a CR residual vector, the CG solution vector, the CG residual vector, and the eigenvector. A CR iteration is performed to update the CR solution vector. The CR residual vector is updated using the first order derivative vector, the second order derivative matrix, and the updated CR solution vector. The process is repeated until a stop criterion is satisfied.

    Deep learning model training system

    公开(公告)号:US11727274B1

    公开(公告)日:2023-08-15

    申请号:US17820342

    申请日:2022-08-17

    CPC classification number: G06N3/08 G06F18/211

    Abstract: A computer trains a neural network. A neural network is executed with a weight vector to compute a gradient vector using a batch of observation vectors. Eigenvalues are computed from a Hessian approximation matrix, a regularization parameter value is computed using the gradient vector, the eigenvalues, and a step-size value, a search direction vector is computed using the eigenvalues, the gradient vector, the Hessian approximation matrix, and the regularization parameter value, a reduction ratio value is computed, an updated weight vector is computed from the weight vector, a learning rate value, and the search direction vector or the gradient vector based on the computed reduction ratio value, and an updated Hessian approximation matrix is computed from the Hessian approximation matrix, the predefined learning rate value, and the search direction vector or the gradient vector based on the reduction ratio value. The step-size value is updated using the search direction vector.

Patent Agency Ranking