Training policy neural networks using path consistency learning

Invention Grant

US10733502B2 Training policy neural networks using path consistency learning 有权

Please log in to see more content

Patent Title: Training policy neural networks using path consistency learning
Application No.: US16504934

Application Date: 2019-07-08
Publication No.: US10733502B2

Publication Date: 2020-08-04
Inventor: Ofir Nachum , Mohammad Norouzi , Dale Eric Schuurmans , Kelvin Xu
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Fish & Richardson P.C.
Main IPC: G06N3/04
IPC: G06N3/04 ; G06N3/08

Training policy neural networks using path consistency learning

Abstract:

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

Public/Granted literature

US20190332922A1 TRAINING POLICY NEURAL NETWORKS USING PATH CONSISTENCY LEARNING Public/Granted day:2019-10-31

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/04	..体系结构，例如，互连拓扑