Online machine learning with immediate rewards when real rewards are delayed

Invention Grant

US12056584B2 Online machine learning with immediate rewards when real rewards are delayed 有权

Please log in to see more content

Patent Title: Online machine learning with immediate rewards when real rewards are delayed
Application No.: US17098829

Application Date: 2020-11-16
Publication No.: US12056584B2

Publication Date: 2024-08-06
Inventor: Oznur Alkan , Djallel Bouneffouf , Bei Chen , Elizabeth Daly
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agency: Scully, Scott, Murphy & Presser, P.C.
Agent Yuanmin Cai
Main IPC: G06G7/48
IPC: G06G7/48 ; G06F16/951 ; G06F18/21 ; G06F18/214 ; G06N20/00 ; G16H50/20

Online machine learning with immediate rewards when real rewards are delayed

Abstract:

An online machine learning model such as an autonomous agent predicts an action. A processor associated with, or running, the online machine learning model observes an environment for an interval of time for a real reward associated with the action. Responsive to determining that the real reward is not received within the interval of time, the processor determines based on a criterion whether to allocate an immediate reward received within the interval of time to the online machine learning model, where the immediate reward is an approximation of the real reward. Responsive to determining that the immediate reward is to be allocated, the processor allocates the immediate reward to the online machine learning model. The online machine learning model further learns or retrains itself based on the immediate reward.

Public/Granted literature

US20220156637A1 ONLINE MACHINE LEARNING WITH IMMEDIATE REWARDS WHEN REAL REWARDS ARE DELAYED Public/Granted day:2022-05-19

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06G	模拟计算机（模拟光学计算设备入G06E3/00；基于特定计算模型的计算机系统入G06N）
G06G7/00	通过改变电量或磁量执行计算操作的器件（用于图像数据处理的中枢网络入G06T；声音的分析或合成入G10L）
G06G7/48	.用于特定的过程、系统或设备的模拟计算机，例如，模拟器