Invention Grant
US08060454B2 Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction 失效
使用非线性维数降低改进奖励学习的方法和装置

Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction
Abstract:
The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.
Information query
Patent Agency Ranking
0/0