Systems and methods for text analytics processor
Abstract:
A hardware-based programmable text analytics processor has a plurality of components including at least a tokenizer, a tagger, a parser, and a classifier. The tokenizer processes an input stream of unstructured text data and identifies a sequence of tokens along with their associated token ids. The tagger assigns a tag to each of the sequence of tokens from the tokenizer using a trained machine learning model. The parser parses the tagged tokens from the tagger and creates a parse tree for the tagged tokens via a plurality of shift, reduce and/or finalize transitions based on a trained machine learning model. The classifier performs classification for tagging and parsing by accepting features extracted by the tagger and the parser, classifying the features and returning classes of the features back to the tagger and the parser, respectively. The TAP then outputs structured data to be processed for various text analytics processing applications.
Public/Granted literature
Information query
Patent Agency Ranking
0/0