Autonomous open schema construction from unstructured text
Abstract:
Disclosed is a natural language processing pipeline that analyzes and processes a corpus of textual data to automatically create a knowledge graph containing the corpus entities such as subjects and object and their relationships such as predicates or verbs. The pipeline is configured as an end-to-end neural Open Schema Construction pipeline having a coreference resolution module, an open information extraction (OIE) module, and an entity canonicalization module. The processed textual data is input to a graph database to create the knowledge graph displayable through a graphical user interface. In operation, the pipeline modules serve to create a single term for all entity mentions in the corpus that reference the same entity through coreference resolution, extract all subject-predicate-object triplets from the coreference resolved corpus through OIE, and then canonicalize the corpus by clustering each entity mention to a canonical form for mapping to the knowledge graph and display.
Public/Granted literature
Information query
Patent Agency Ranking
0/0