Contextual interestingness ranking of documents for due diligence in the banking industry with entity grouping
Abstract:
Documents needing to be analyzed for various reasons, such as financial crimes, are ranked by examining the topicality and sentiment present in each document for a given subject of interest. In one approach a given document is classified to determine its category, and entity recognition is used to identify the subject of interest. Passages from the document that relate to the entity are grouped and analyzed for sentiment to generate a sentiment score. Documents are then ranked based on the sentiment scores. In another approach, a classification probability score is computed for each passage representing a likelihood that the passage relates to a category of interest, and the document is ranked based on the sentiment scores and the classification probability scores. The category classification uses an ensemble of natural language text classifiers. One of the classifiers is a naïve Bayes classifier with feature vectors generated using Word2Vec modeling.
Information query
Patent Agency Ranking
0/0