Invention Grant
- Patent Title: Secure exploration for reinforcement learning
-
Application No.: US16554525Application Date: 2019-08-28
-
Publication No.: US11616813B2Publication Date: 2023-03-28
- Inventor: Harm Hendrik Van Seijen , Seyed Mehdi Fatemi Booshehri
- Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
- Applicant Address: US WA Redmond
- Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
- Current Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
- Current Assignee Address: US WA Redmond
- Agency: Shook, Hardy & Bacon, LLP
- Main IPC: H04L9/40
- IPC: H04L9/40 ; G06N20/00 ; G06N5/043

Abstract:
A secured exploration agent for reinforcement learning (RL) is provided. Securitizing an exploration agent includes training the exploration agent to avoid dead-end states and dead-end trajectories. During training, the exploration agent “learns” to identify and avoid dead-end states of a Markov Decision Process (MDP). The secured exploration agent is utilized to safely and efficiently explore the environment, while significantly reducing the training time, as well as the cost and safety concerns associated with conventional RL. The secured exploration agent is employed to guide the behavior of a corresponding exploitation agent. During training, a policy of the exploration agent is iteratively updated to reflect an estimated probability that a state is a dead-end state. The probability, via the exploration policy, that the exploration agent chooses an action that results in a transition to a dead-end state is reduced to reflect the estimated probability that the state is a dead-end state.
Public/Granted literature
- US20200076857A1 SECURE EXPLORATION FOR REINFORCEMENT LEARNING Public/Granted day:2020-03-05
Information query