Reinforcement learning for active sequence processing

Invention Grant

US12175737B2 Reinforcement learning for active sequence processing 有权

Please log in to see more content

Patent Title: Reinforcement learning for active sequence processing
Application No.: US17773789

Application Date: 2020-11-13
Publication No.: US12175737B2

Publication Date: 2024-12-24
Inventor: Viorica Patraucean , Bilal Piot , Joao Carreira , Volodymyr Mnih , Simon Osindero
Applicant: DEEPMIND TECHNOLOGIES LIMITED
Applicant Address: GB London
Assignee: DEEPMIND TECHNOLOGIES LIMITED
Current Assignee: DEEPMIND TECHNOLOGIES LIMITED
Current Assignee Address: GB London
Agency: Fish & Richardson P.C.
International Application: PCT/EP2020/082041 WO 20201113
International Announcement: WO2021/094522 WO 20210520
Main IPC: G06V10/82
IPC: G06V10/82 ; G06N3/045 ; G06N3/048

Reinforcement learning for active sequence processing

Abstract:

A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network. The task neural network is configured to: receive the sequence of task inputs, receive, from the RL neural network, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, process each of the un-skipped task inputs in the sequence of task inputs to generate a respective accumulated feature for the un-skipped task input, wherein the respective accumulated feature characterizes features of the un-skipped task input and of previous un-skipped task inputs in the sequence, and generate a machine learning task output for the machine learning task based on the last accumulated feature generated for the last un-skipped task input in the sequence.

Public/Granted literature

US20220392206A1 REINFORCEMENT LEARNING FOR ACTIVE SEQUENCE PROCESSING Public/Granted day:2022-12-08

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V10/00	图像或视频识别或理解的安排（图像或视频中的字符识别 G06V30/10）
G06V10/70	.使用模式识别或机器学习（光学模式识别或电子计算 G06V10/88）
G06V10/82	..使用神经网络