Ontology-based document analysis and annotation generation
Abstract:
Techniques for cognitive annotation are provided. An electronic document including textual data is received. A plurality of importance scores are generated for a plurality of words included in the electronic document by processing the electronic document using a trained passage encoder. Important words are identified based on the plurality of importance scores. One or more clusters of words are generated, where each of the one or more clusters of words includes at least one of the plurality of important words. A representative word is selected for a first cluster, and the representative word is mapped to one or more concepts from a predefined list of concepts. The one or more concepts are disambiguated to identify a set of relevant concepts for the electronic document. An annotated version of the electronic document is generated based at least in part on the set of relevant concepts.
Public/Granted literature
Information query
Patent Agency Ranking
0/0