Invention Grant
- Patent Title: Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction
- Patent Title (中): 使用非线性维数降低改进奖励学习的方法和装置
-
Application No.: US11870698Application Date: 2007-10-11
-
Publication No.: US08060454B2Publication Date: 2011-11-15
- Inventor: Rajarshi Das , Gerald J. Tesauro , Kilian Q. Weinberger
- Applicant: Rajarshi Das , Gerald J. Tesauro , Kilian Q. Weinberger
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Main IPC: G06F15/18
- IPC: G06F15/18

Abstract:
The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.
Public/Granted literature
- US20090098515A1 METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING NONLINEAR DIMENSIONALITY REDUCTION Public/Granted day:2009-04-16
Information query