Value function representation method of reinforcement learning and apparatus using this

Invention Grant

US08175982B2 Value function representation method of reinforcement learning and apparatus using this 失效

Title translation: 强化学习的价值函数表示方法及其使用的装置

Please log in to see more content

Patent Title: Value function representation method of reinforcement learning and apparatus using this
Patent Title (中): 强化学习的价值函数表示方法及其使用的装置
Application No.: US12065558

Application Date: 2006-08-18
Publication No.: US08175982B2

Publication Date: 2012-05-08
Inventor: Tomoki Hamagami , Takesi Shibuya
Applicant: Tomoki Hamagami , Takesi Shibuya
Applicant Address: JP Yokohama
Assignee: Nat'l University Corp. Yokohama Nat'l University
Current Assignee: Nat'l University Corp. Yokohama Nat'l University
Current Assignee Address: JP Yokohama
Agency: Westerman, Hattori, Daniels & Adrian, LLP
Priority: JP2005-254763 20050902
International Application: PCT/JP2006/316659 WO 20060818
International Announcement: WO2007/029516 WO 20070315
Main IPC: G06F15/18
IPC: G06F15/18 ; G06E1/00

Value function representation method of reinforcement learning and apparatus using this

Abstract:

Reinforcement learning is one of the intellectual operations applied to autonomously moving robots etc. It is a system having excellent sides, for example, enabling operation in unknown environments. However, it has the basic problem called the “incomplete perception problem”. A variety of solution has been proposed, but none has been decisive. The systems also become complex. A simple and effective method of solution has been desired.A complex value function defining a state-action value by a complex number is introduced. Time series information is introduced into a phase part of the complex number value. Due to this, the time series information is introduced into the value function without using a complex algorithm, so the incomplete perception problem is effectively solved by simple loading of the method.

Abstract(Chinese):

加固学习是应用于自主移动机器人等的智力操作之一。它是具有优异方面的系统，例如，在未知环境中运行。但是，它有一个基本的问题叫做“不完全的感知问题”。已经提出了各种解决方案，但没有一个是决定性的。系统也变得复杂。希望有一种简单有效的解决方法。介绍了通过复数定义状态动作值的复数值函数。时间序列信息被引入复数值的相位部分。由此，将时间序列信息引入到值函数中而不使用复杂的算法，因此通过简单的方法加载有效地解决了不完全的感知问题。

Public/Granted literature

US20090234783A1 VALUE FUNCTION REPRESENTATION METHOD OF REINFORCEMENT LEARNING AND APPARATUS USING THIS Public/Granted day:2009-09-17

Information query

Espacenet