Method and system for training reinforcement learning agent using adversarial sampling
Abstract:
Methods and systems of training RL agent for autonomous operation of a vehicle are described. The RL agent is trained using uniformly sampled training samples and learning a policy. After the RL agent has achieved a predetermined performance goal, data is collected including a sequence of sampled states, and for each sequence of sampled states, agent parameters, and an indication of failure of the RL agent for the sequence. A failure predictor is trained, using samples from the collected data, to predict a probability of failure of the RL agent for a given sequence of states. Sequences of states are collected by simulating interaction of the vehicle with the environment. Based on a probability of failure outputted by the failure predictor, a sequence of states is selected. The RL agent is further trained based on the selected sequence of states.
Information query
Patent Agency Ranking
0/0