Font family and size aware character segmentation
Abstract:
A method clusters each character on a document into one of a plurality of clusters based on widths of at least a portion of the characters on the document and measures distances between characters on the document. A threshold for each of the plurality of clusters is calculated based on at least a portion of the distances between characters in each cluster. The method then segments characters into units using the thresholds for the plurality of clusters. A distance between two characters in the document is compared to a threshold for a cluster to classify the two characters as being part of a unit when the distance is less than the threshold and not being part of the unit when the distance is greater than the threshold. Then, the method performs a recognition process on the document using the units.
Public/Granted literature
Information query
Patent Agency Ranking
0/0