一种基于深度强化学习的机车智能操纵方法与系统

Invention Grant

CN106842925B 一种基于深度强化学习的机车智能操纵方法与系统有权

Please log in to see more content

Patent Title: 一种基于深度强化学习的机车智能操纵方法与系统
Application No.: CN201710045758.0

Application Date: 2017-01-20
Publication No.: CN106842925B

Publication Date: 2019-10-11
Inventor: 赵曦滨 , 夏雅楠 , 黄晋 , 卢莎 , 任育琦 , 顾明 , 孙家广
Applicant: 清华大学 , 中车信息技术有限公司 , 中车大连机车研究所有限公司
Applicant Address: 北京市海淀区清华园
Assignee: 清华大学,中车信息技术有限公司,中车大连机车研究所有限公司
Current Assignee: 清华大学,中车信息技术有限公司,中车大连机车研究所有限公司
Current Assignee Address: 北京市海淀区清华园
Agency: 北京律谱知识产权代理事务所
Agent 罗建书
Main IPC: G05B13/04
IPC: G05B13/04

Abstract:

本发明涉及一种基于深度强化学习的机车智能操纵方法与系统，该系统包括数据源模块、机车运行环境学习模块、评价机制学习模块和控制策略学习模块，数据源模块为机车运行环境学习模块和评价机制学习模块提供所需的数据输入，机车运行环境学习模块和评价机制学习模块将分别获得的具体的运行环境和奖赏函数值输出至控制策略学习模块。基于深度强化学习算法，机车运行环境模型以机车操纵动作的实时评价作为反馈信息，通过奖赏或惩罚当前的操纵动作，给控制策略反馈一个奖赏函数作为奖赏评价值，控制策略结合运行状态迭代地进行策略的更新与优化。本发明能更好的实现机车智能优化操纵，并极大地减少了人工参与。

Abstract(English):

The invention relates to an intelligent locomotive operation method and system based on deep reinforcement learning. The system comprises a data source module, a locomotive operation environment learning module, an evaluation mechanism learning module and a control strategy learning module, the data source module provides needed data input for the locomotive operation environment learning module and the evaluation mechanism learning module, and the locomotive operation environment learning module and the evaluation mechanism learning module output an obtained specific operating environment and a reward function value to the control strategy learning module. On the basis of a deep reinforcement learning algorithm, a locomotive operation environment model takes real-time evaluation of the locomotive operation action as feedback information, by rewarding or punishing the current operation action, a reward function is fed back to the control strategy to serve as a reward evaluation value, and the control strategy is combined with the operating state to iteratively update and optimize the strategy. Accordingly, intelligent and optimized locomotive operation can be better achieved, and artificial participation is greatly reduced.

Public/Granted literature

CN106842925A 一种基于深度强化学习的机车智能操纵方法与系统 Public/Granted day:2017-06-13

Information query

Chinese Patent Announcement Global Dossier Espacenet

IPC分类:

G	物理
G05	控制；调节
G05B	一般的控制或调节系统；这种系统的功能单元；用于这种系统或单元的监视或测试装置（应用流体作用的一般流体压力执行器或系统入F15B；阀门本身入F16K；仅按机械特征区分的入G05G；传感元件见相应小类，例如G12B，G01、H01的小类；校正单元见相应的小类，例如H02K）
G05B13/00	自适应控制系统，即系统按照一些预定的准则自动调整自己使之具有最佳性能的系统（G05B19/00优先；机器学习G06N 20/00）
G05B13/02	.电的
G05B13/04	..包括使用模型或模拟器的