Method and system for training reinforcement learning agent using adversarial sampling

Invention Grant

US11994862B2 Method and system for training reinforcement learning agent using adversarial sampling 有权

Please log in to see more content

Patent Title: Method and system for training reinforcement learning agent using adversarial sampling
Application No.: US16920598

Application Date: 2020-07-03
Publication No.: US11994862B2

Publication Date: 2024-05-28
Inventor: Elmira Amirloo Abolfathi , Jun Luo , Peyman Yadmellat
Applicant: Elmira Amirloo Abolfathi , Jun Luo , Peyman Yadmellat
Applicant Address: CA North York
Assignee: HUAWEI TECHNOLOGIES CO., LTD.
Current Assignee: HUAWEI TECHNOLOGIES CO., LTD.
Current Assignee Address: CN Shenzhen
Main IPC: G06N3/08
IPC: G06N3/08 ; G05D1/00 ; G06F18/21 ; G06F18/214 ; G06N3/047

Method and system for training reinforcement learning agent using adversarial sampling

Abstract:

Methods and systems of training RL agent for autonomous operation of a vehicle are described. The RL agent is trained using uniformly sampled training samples and learning a policy. After the RL agent has achieved a predetermined performance goal, data is collected including a sequence of sampled states, and for each sequence of sampled states, agent parameters, and an indication of failure of the RL agent for the sequence. A failure predictor is trained, using samples from the collected data, to predict a probability of failure of the RL agent for a given sequence of states. Sequences of states are collected by simulating interaction of the vehicle with the environment. Based on a probability of failure outputted by the failure predictor, a sequence of states is selected. The RL agent is further trained based on the selected sequence of states.

Public/Granted literature

US20210004647A1 METHOD AND SYSTEM FOR TRAINING REINFORCEMENT LEARNING AGENT USING ADVERSARIAL SAMPLING Public/Granted day:2021-01-07

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N3/00	基于生物学模型的计算机系统
G06N3/02	.采用神经网络模型
G06N3/08	..学习方法