EXTRACTING FINE-GRAINED TOPICS FROM TEXT CONTENT

Invention Publication

US20230161964A1 EXTRACTING FINE-GRAINED TOPICS FROM TEXT CONTENT 审中-公开

Please log in to see more content

Patent Title: EXTRACTING FINE-GRAINED TOPICS FROM TEXT CONTENT
Application No.: US17534502

Application Date: 2021-11-24
Publication No.: US20230161964A1

Publication Date: 2023-05-25
Inventor: Deven Santosh SHAH , Sukanya MOORTHY , Topojoy BISWAS
Applicant: YAHOO AD TECH LLC
Applicant Address: US VA Dulles
Assignee: YAHOO AD TECH LLC
Current Assignee: YAHOO AD TECH LLC
Current Assignee Address: US VA Dulles
Main IPC: G06F40/30
IPC: G06F40/30 ; G06F40/166 ; G06F40/40 ; G06N3/08

EXTRACTING FINE-GRAINED TOPICS FROM TEXT CONTENT

Abstract:

The example embodiments are directed toward improvements in document classification. In an embodiment, a method is disclosed comprising generating a set of sentences based on a document; predicting a set of labels for each sentence using a multi-label classifier, the multi-label classifier including a self-attended contextual word embedding backbone layer, a bank of trainable unigram convolutions, a bank of trainable bigram convolutions, and a fully connected layer the multi-label classifier trained using a weakly labeled data set; and labeling the document based on the set of labels. The various embodiments can target multiple use cases such as identifying related entities, trending related entities, creating ephemeral timeline of entities, and others using a single solution. Further, the various embodiments provide a weakly supervised framework to train a model when a labeled golden set does not contain a sufficient number of examples.

Public/Granted literature

US11983502B2 Extracting fine-grained topics from text content Public/Granted day:2024-05-14

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/30	.语义分析