Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning

Invention Grant

US11074480B2 Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning 有权

Please log in to see more content

Patent Title: Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning
Application No.: US16740135

Application Date: 2020-01-10
Publication No.: US11074480B2

Publication Date: 2021-07-27
Inventor: Kye-Hyeon Kim , Yongjoong Kim , Hak-Kyoung Kim , Woonhyun Nam , SukHoon Boo , Myungchul Sung , Dongsoo Shin , Donghun Yeo , Wooju Ryu , Myeong-Chun Lee , Hyungsoo Lee , Taewoong Jang , Kyungjoong Jeong , Hongmo Je , Hojin Cho
Applicant: Stradvision, Inc.
Applicant Address: KR Pohang-si
Assignee: Stradvision, Inc.
Current Assignee: Stradvision, Inc.
Current Assignee Address: KR Pohang-si
Agency: Kaplan Breyer Schwarz, LLP
Main IPC: G06K9/62
IPC: G06K9/62 ; G05D1/00 ; G05D1/02 ; G06T15/20 ; G06T17/05

Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning

Abstract:

A learning method for acquiring at least one personalized reward function, used for performing a Reinforcement Learning (RL) algorithm, corresponding to a personalized optimal policy for a subject driver is provided. And the method includes steps of: (a) a learning device performing a process of instructing an adjustment reward network to generate first adjustment rewards, by referring to the information on actual actions and actual circumstance vectors in driving trajectories, a process of instructing a common reward module to generate first common rewards by referring to the actual actions and the actual circumstance vectors, and a process of instructing an estimation network to generate actual prospective values by referring to the actual circumstance vectors; and (b) the learning device instructing a first loss layer to generate an adjustment reward and to perform backpropagation to learn parameters of the adjustment reward network.

Public/Granted literature

US20200250486A1 LEARNING METHOD AND LEARNING DEVICE FOR SUPPORTING REINFORCEMENT LEARNING BY USING HUMAN DRIVING DATA AS TRAINING DATA TO THEREBY PERFORM PERSONALIZED PATH PLANNING Public/Granted day:2020-08-06

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置