Apparatus, method and recording medium for controlling system using temporal difference error

Invention Grant

US11573537B2 Apparatus, method and recording medium for controlling system using temporal difference error 有权

Please log in to see more content

Patent Title: Apparatus, method and recording medium for controlling system using temporal difference error
Application No.: US16130469

Application Date: 2018-09-13
Publication No.: US11573537B2

Publication Date: 2023-02-07
Inventor: Tomotake Sasaki , Eiji Uchibe , Kenji Doya , Hirokazu Anai , Hitoshi Yanami , Hidenao Iwane
Applicant: FUJITSU LIMITED , Okinawa Institute of Science and Technology School Corporation
Applicant Address: JP Kawasaki; JP Okinawa
Assignee: FUJITSU LIMITED,Okinawa Institute of Science and Technology School Corporation
Current Assignee: FUJITSU LIMITED,Okinawa Institute of Science and Technology School Corporation
Current Assignee Address: JP Kawasaki; JP Okinawa
Agency: Staas & Halsey LLP
Priority: JPJP2017-177985 20170915
Main IPC: G05B13/02
IPC: G05B13/02 ; G06F17/16 ; G06N20/00 ; G06N3/00 ; G05B13/04

Apparatus, method and recording medium for controlling system using temporal difference error

Abstract:

A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Public/Granted literature

US20190086876A1 RECORDING MEDIUM, POLICY IMPROVING METHOD, AND POLICY IMPROVING APPARATUS Public/Granted day:2019-03-21

Information query

Espacenet

IPC分类:

G	物理
G05	控制；调节
G05B	一般的控制或调节系统；这种系统的功能单元；用于这种系统或单元的监视或测试装置（应用流体作用的一般流体压力执行器或系统入F15B；阀门本身入F16K；仅按机械特征区分的入G05G；传感元件见相应小类，例如G12B，G01、H01的小类；校正单元见相应的小类，例如H02K）
G05B13/00	自适应控制系统，即系统按照一些预定的准则自动调整自己使之具有最佳性能的系统（G05B19/00优先；机器学习G06N 20/00）
G05B13/02	.电的