Method and system for determining structural blocks of a document
Abstract:
This disclosure relates to method and system for determining structural blocks of a document. The method may include extracting text lines from the document, generating a feature vector for each text line by determining feature values for a set of features in the each text line, and determining at least one dominant feature from among the set of features and at least one corresponding dominance factor, for each structural class, based on the feature vector for each text line. The method may further include deriving a set of rules for classification of the text lines into respective structural classes and determining a structural block tag for each text line based on the set of rules. Each of the set of rules correspond to one of the structural classes and is based on the at least one dominant feature and the at least one corresponding dominance factor for that class.
Public/Granted literature
Information query
Patent Agency Ranking
0/0