Removal of underlines and table lines in document images while preserving intersecting character strokes
    1.
    发明授权
    Removal of underlines and table lines in document images while preserving intersecting character strokes 有权
    删除文档图像中的下划线和表格线,同时保留相交的字符笔画

    公开(公告)号:US09235755B2

    公开(公告)日:2016-01-12

    申请号:US13968251

    申请日:2013-08-15

    Inventor: Chaohong Wu

    CPC classification number: G06K9/00449 G06F17/245 G06K9/346

    Abstract: A method for removing horizontal and vertical lines in a document image while preserving integrity of the character strokes that intersect the lines. For each detected horizontal line, a vertical run length profile is calculated. Areas of the run length profile having two adjacent peaks with a valley in between are detected, which correspond to intersections of the horizontal line with non-vertical lines. A first derivative curve may be used to detect such peaks and valleys. Areas of the run length profile with large run length value for consecutive pixel locations are also detected, which corresponds to intersections of the horizontal line with near vertical lines. The horizontal line is removed in areas outside of the intersection areas, while preserving pixels within the intersection areas. Vertical line removal may be done similarly. This template-free method can remove lines in tables, forms, and underline and extract handwriting or printed characters.

    Abstract translation: 一种删除文档图像中的水平和垂直线的方法,同时保持与线相交的字符笔画的完整性。 对于每个检测到的水平线,计算垂直行程长度分布。 检测具有两个相邻的峰之间的谷的游程长度轮廓的区域,其对应于水平线与非垂直线的交点。 可以使用一阶导数曲线来检测这样的峰和谷。 还检测到连续像素位置的具有大游程长度值的游程长度分布的区域,其对应于水平线与近垂直线的交点。 在交叉区域以外的区域移除水平线,同时保留交叉区域内的像素。 可以类似地进行垂直线去除。 这种无模板的方法可以删除表格,表单和下划线中的行,并提取手写或打印字符。

    Line segmentation method applicable to document images containing handwriting and printed text characters or skewed text lines
    2.
    发明授权
    Line segmentation method applicable to document images containing handwriting and printed text characters or skewed text lines 有权
    线分割方法适用于包含手写和打印文本字符或偏斜文本行的文档图像

    公开(公告)号:US09104940B2

    公开(公告)日:2015-08-11

    申请号:US14015048

    申请日:2013-08-30

    Inventor: Chaohong Wu

    CPC classification number: G06K9/344 G06K9/3283 G06K9/342 G06K2209/01

    Abstract: A text line segmentation method for a document image containing printed text and handwriting, or document image containing skewed lines or printed text. Connected component (CC) are obtained for the document, and their bounding boxes and centroids are calculated. The CCs are categorized into three categories based on bounding box sizes: small objects, regular text objects, and large objects involving handwriting. The centroids of regular text objects are used in a cluster analysis to find the vertical centers of the N text lines. Then, each CC is classified into one of the N lines based on the vertical distance between its centroid and the vertical centers of text lines, and copied into to a corresponding object board. Extra spaces are removed from the object boards to obtain the line segments. The large object involving handwriting will be classified into one of the lines but absent from other lines.

    Abstract translation: 用于包含打印文本和手写的文档图像的文本行分割方法,或包含偏斜线或打印文本的文档图像。 获得文档的连接分量(CC),并计算其边界框和重心。 CC根据边框大小分为三类:小对象,常规文本对象和涉及手写的大对象。 在聚类分析中使用常规文本对象的质心来查找N个文本行的垂直中心。 然后,基于其质心与文本行的垂直中心之间的垂直距离,将每个CC分类为N行之一,并将其​​复制到相应的对象板中。 从对象板上移除额外的空格以获得线段。 涉及笔迹的大对象将被分类为其中一行,但不存在于其他行。

    LINE SEGMENTATION METHOD APPLICABLE TO DOCUMENT IMAGES CONTAINING HANDWRITING AND PRINTED TEXT CHARACTERS OR SKEWED TEXT LINES
    3.
    发明申请
    LINE SEGMENTATION METHOD APPLICABLE TO DOCUMENT IMAGES CONTAINING HANDWRITING AND PRINTED TEXT CHARACTERS OR SKEWED TEXT LINES 有权
    适用于包含手写和打印文字或文字行的文档图像的线分割方法

    公开(公告)号:US20150063699A1

    公开(公告)日:2015-03-05

    申请号:US14015048

    申请日:2013-08-30

    Inventor: Chaohong Wu

    CPC classification number: G06K9/344 G06K9/3283 G06K9/342 G06K2209/01

    Abstract: A text line segmentation method for a document image containing printed text and handwriting, or document image containing skewed lines or printed text. Connected component (CC) are obtained for the document, and their bounding boxes and centroids are calculated. The CCs are categorized into three categories based on bounding box sizes: small objects, regular text objects, and large objects involving handwriting. The centroids of regular text objects are used in a cluster analysis to find the vertical centers of the N text lines. Then, each CC is classified into one of the N lines based on the vertical distance between its centroid and the vertical centers of text lines, and copied into to a corresponding object board. Extra spaces are removed from the object boards to obtain the line segments. The large object involving handwriting will be classified into one of the lines but absent from other lines.

    Abstract translation: 用于包含打印文本和手写的文档图像的文本行分割方法,或包含偏斜线或打印文本的文档图像。 获得文档的连接分量(CC),并计算其边界框和重心。 CC根据边框大小分为三类:小对象,常规文本对象和涉及手写的大对象。 在聚类分析中使用常规文本对象的质心来查找N个文本行的垂直中心。 然后,基于其质心与文本行的垂直中心之间的垂直距离,将每个CC分类为N行之一,并将其​​复制到相应的对象板中。 从对象板上移除额外的空格以获得线段。 涉及笔迹的大对象将被分类为其中一行,但不存在于其他行。

    REMOVAL OF UNDERLINES AND TABLE LINES IN DOCUMENT IMAGES WHILE PRESERVING INTERSECTING CHARACTER STROKES
    4.
    发明申请
    REMOVAL OF UNDERLINES AND TABLE LINES IN DOCUMENT IMAGES WHILE PRESERVING INTERSECTING CHARACTER STROKES 有权
    在保存交叉字符串时删除文档图像中的下划线和表线

    公开(公告)号:US20150052426A1

    公开(公告)日:2015-02-19

    申请号:US13968251

    申请日:2013-08-15

    Inventor: Chaohong Wu

    CPC classification number: G06K9/00449 G06F17/245 G06K9/346

    Abstract: A method for removing horizontal and vertical lines in a document image while preserving integrity of the character strokes that intersect the lines. For each detected horizontal line, a vertical run length profile is calculated. Areas of the run length profile having two adjacent peaks with a valley in between are detected, which correspond to intersections of the horizontal line with non-vertical lines. A first derivative curve may be used to detect such peaks and valleys. Areas of the run length profile with large run length value for consecutive pixel locations are also detected, which corresponds to intersections of the horizontal line with near vertical lines. The horizontal line is removed in areas outside of the intersection areas, while preserving pixels within the intersection areas. Vertical line removal may be done similarly. This template-free method can remove lines in tables, forms, and underline and extract handwriting or printed characters.

    Abstract translation: 一种删除文档图像中的水平和垂直线的方法,同时保持与线相交的字符笔画的完整性。 对于每个检测到的水平线,计算垂直行程长度分布。 检测具有两个相邻的峰之间的谷的游程长度轮廓的区域,其对应于水平线与非垂直线的交点。 可以使用一阶导数曲线来检测这样的峰和谷。 还检测到连续像素位置的具有大游程长度值的游程长度分布的区域,其对应于水平线与近垂直线的交点。 在交叉区域以外的区域移除水平线,同时保留交叉区域内的像素。 可以类似地进行垂直线去除。 这种无模板的方法可以删除表格,表单和下划线中的行,并提取手写或打印字符。

Patent Agency Ranking