Distributional reinforcement learning using quantile function neural networks

Invention Grant

US11887000B2 Distributional reinforcement learning using quantile function neural networks 有权

Please log in to see more content

Patent Title: Distributional reinforcement learning using quantile function neural networks
Application No.: US18169803

Application Date: 2023-02-15
Publication No.: US11887000B2

Publication Date: 2024-01-30
Inventor: Georg Ostrovski , William Clinton Dabney
Applicant: DeepMind Technologies Limited
Applicant Address: GB London
Assignee: DeepMind Technologies Limited
Current Assignee: DeepMind Technologies Limited
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
Main IPC: G06N3/08
IPC: G06N3/08 ; G06N3/04

Distributional reinforcement learning using quantile function neural networks

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method comprises: receiving a current observation; for each action of a plurality of actions: randomly sampling one or more probability values; for each probability value: processing the action, the current observation, and the probability value using a quantile function network to generate an estimated quantile value for the probability value with respect to a probability distribution over possible returns that would result from the agent performing the action in response to the current observation; determining a measure of central tendency of the one or more estimated quantile values; and selecting an action to be performed by the agent in response to the current observation using the measures of central tendency for the actions.

Public/Granted literature

US20230196108A1 DISTRIBUTIONAL REINFORCEMENT LEARNING USING QUANTILE FUNCTION NEURAL NETWORKS Public/Granted day:2023-06-22

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法