Training action selection neural networks using off-policy actor critic reinforcement learning

Invention Grant

US10706352B2 Training action selection neural networks using off-policy actor critic reinforcement learning 有权

Please log in to see more content

Patent Title: Training action selection neural networks using off-policy actor critic reinforcement learning
Application No.: US16402687

Application Date: 2019-05-03
Publication No.: US10706352B2

Publication Date: 2020-07-07
Inventor: Ziyu Wang , Nicolas Manfred Otto Heess , Victor Constant Bapst
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
Main IPC: G06N3/04
IPC: G06N3/04 ; G06N3/08 ; G06N3/00

Training action selection neural networks using off-policy actor critic reinforcement learning

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

Public/Granted literature

US20190258918A1 TRAINING ACTION SELECTION NEURAL NETWORKS Public/Granted day:2019-08-22

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/04	..体系结构，例如，互连拓扑