Training action selection neural networks using a differentiable credit function

Invention Grant

US11651208B2 Training action selection neural networks using a differentiable credit function 有权

Please log in to see more content

Patent Title: Training action selection neural networks using a differentiable credit function
Application No.: US16615042

Application Date: 2018-05-22
Publication No.: US11651208B2

Publication Date: 2023-05-16
Inventor: Zhongwen Xu , Hado Phillip van Hasselt , Joseph Varughese Modayil , Andre da Motta Salles Barreto , David Silver
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
International Application: PCT/EP2018/063279 2018.05.22
International Announcement: WO2018/211139A 2018.11.22
Date entered country: 2019-11-19
Main IPC: G06N3/08
IPC: G06N3/08 ; G06N3/04

Training action selection neural networks using a differentiable credit function

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. A reinforcement learning neural network selects actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result. The reinforcement learning neural network has at least one input to receive an input observation characterizing a state of the environment and at least one output for determining an action to be performed by the agent in response to the input observation. The system includes a reward function network coupled to the reinforcement learning neural network. The reward function network has an input to receive reward data characterizing a reward provided by one or more states of the environment and is configured to determine a reward function to provide one or more target values for training the reinforcement learning neural network.

Public/Granted literature

US20200175364A1 TRAINING ACTION SELECTION NEURAL NETWORKS USING A DIFFERENTIABLE CREDIT FUNCTION Public/Granted day:2020-06-04

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法