Unsupervised removal of text from images using linear programming for optimal filter design
Abstract:
Techniques are disclosed for removing text from an image of a form document. The text is removed by determining a spectral domain representation of the image and applying a filter to remove the high-frequency components which correspond to the text in the form. An image is reconstructed from the filtered spectral domain representation which maintains the low-frequency components, while deemphasizing or removing the high-frequency components. A shape of the filter applied to the spectral domain representation is determined based on a similarity measure between the image of the form and the reconstructed image.
Information query
Patent Agency Ranking
0/0