-
公开(公告)号:US10318803B1
公开(公告)日:2019-06-11
申请号:US15828110
申请日:2017-11-30
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Shubham Agarwal , Yongmian Zhang
Abstract: In a text line segmentation process, connected components (CCs) in document image are categorized into three subsets (normal, large, small) based on their sizes. The centroids of the normal size CCs are used to perform line detection using Hough transform. Among the detected candidate lines, those with line bounding box heights greater than a certain height are removed. For each normal size CC, if its bounding box does not overlap the bounting box of any line with an overlap area greater than a predefined fraction of the CC bounding box, a new line is added for this CC, which passes through the centroid of the CC and has an average slant angle. Each large size CCs are broken into two or more CCs. All CCs are then assigned to the nearest lines. A refinement method is also described, which can take any text line segmentation result and refine it.
-
2.
公开(公告)号:US20190102653A1
公开(公告)日:2019-04-04
申请号:US15721610
申请日:2017-09-29
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Shubham Agarwal , Maral Mesmakhosroshahi , Yongmian Zhang
Abstract: A local connectivity feature transform (LCFT) is applied to binary document images containing text characters, to generate transformed document images which are then input into a bi-directional Long Short Term Memory (LSTM) neural network to perform character/word recognition. The LCFT transformed image is a gray scale image where the pixel values encode local pixel connectivity information of corresponding pixels in the original binary image. The transform is one that provides a unique transform score for every possible shape represented as a 3×3 block. In one example, the transform is computed using a 3×3 weight matrix that combines bit coding with a zigzag pattern to assign weights to each element of the 3×3 block, and by summing up the weights for the non-zero elements of the 3×3 block shape.
-
3.
公开(公告)号:US20190065817A1
公开(公告)日:2019-02-28
申请号:US15690037
申请日:2017-08-29
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Maral Mesmakhosroshahi , Shubham Agarwal , Yongmian Zhang
Abstract: An artificial neural network system implemented on a computer for cell segmentation and classification of biological images. It includes a deep convolutional neural network as a feature extraction network, a first branch network connected to the feature extraction network to perform cell segmentation, and a second branch network connected to the feature extraction network to perform cell classification using the cell segmentation map generated by the first branch network. The feature extraction network is a modified VGG network where each convolutional layer uses multiple kernels of different sizes. The second branch network takes feature maps from two levels of the feature extraction network, and has multiple fully connected layers to independently process multiple cropped patches of the feature maps, the cropped patches being located at a centered and multiple shifted positions relative to the cell being classified; a voting method is used to determine the final cell classification.
-
公开(公告)号:US20190163971A1
公开(公告)日:2019-05-30
申请号:US15828110
申请日:2017-11-30
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Shubham Agarwal , Yongmian Zhang
Abstract: In a text line segmentation process, connected components (CCs) in document image are categorized into three subsets (normal, large, small) based on their sizes. The centroids of the normal size CCs are used to perform line detection using Hough transform. Among the detected candidate lines, those with line bounding box heights greater than a certain height are removed. For each normal size CC, if its bounding box does not overlap the bounting box of any line with an overlap area greater than a predefined fraction of the CC bounding box, a new line is added for this CC, which passes through the centroid of the CC and has an average slant angle. Each large size CCs are broken into two or more CCs. All CCs are then assigned to the nearest lines. A refinement method is also described, which can take any text line segmentation result and refine it.
-
5.
公开(公告)号:US10282589B2
公开(公告)日:2019-05-07
申请号:US15690037
申请日:2017-08-29
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Maral Mesmakhosroshahi , Shubham Agarwal , Yongmian Zhang
Abstract: An artificial neural network system implemented on a computer for cell segmentation and classification of biological images. It includes a deep convolutional neural network as a feature extraction network, a first branch network connected to the feature extraction network to perform cell segmentation, and a second branch network connected to the feature extraction network to perform cell classification using the cell segmentation map generated by the first branch network. The feature extraction network is a modified VGG network where each convolutional layer uses multiple kernels of different sizes. The second branch network takes feature maps from two levels of the feature extraction network, and has multiple fully connected layers to independently process multiple cropped patches of the feature maps, the cropped patches being located at a centered and multiple shifted positions relative to the cell being classified; a voting method is used to determine the final cell classification.
-
公开(公告)号:US10521697B2
公开(公告)日:2019-12-31
申请号:US15721610
申请日:2017-09-29
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Shubham Agarwal , Maral Mesmakhosroshahi , Yongmian Zhang
Abstract: A local connectivity feature transform (LCFT) is applied to binary document images containing text characters, to generate transformed document images which are then input into a bi-directional Long Short Term Memory (LSTM) neural network to perform character/word recognition. The LCFT transformed image is a gray scale image where the pixel values encode local pixel connectivity information of corresponding pixels in the original binary image. The transform is one that provides a unique transform score for every possible shape represented as a 3×3 block. In one example, the transform is computed using a 3×3 weight matrix that combines bit coding with a zigzag pattern to assign weights to each element of the 3×3 block, and by summing up the weights for the non-zero elements of the 3×3 block shape.
-
公开(公告)号:US20190266443A1
公开(公告)日:2019-08-29
申请号:US15908714
申请日:2018-02-28
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Yongmian Zhang , Shubham Agarwal
Abstract: In an optical character recognition (OCR) method for digitizing printed text images using a long-short term memory (LSTM) network, text images are pre-processed using a stroke-aware max-min pooling method before being fed into the network, for both network training and OCR prediction. During training, an average stroke thickness is computed from the training dataset. Stroke-aware max-min pooling is applied to each text line image, where minimum pooling is applied if the stroke thickness of the line is greater than the average stroke thickness, while max pooling is applied if the stroke thickness is less than or equal to the average stroke thickness. The pooled images are used for network training. During prediction, stroke-aware max-min pooling is applied to each input text line image, and the pooled image is fed to the trained LSTM network to perform character recognition.
-
公开(公告)号:US10373022B1
公开(公告)日:2019-08-06
申请号:US15908714
申请日:2018-02-28
Applicant: KONICA MINOLTA LABORATORY U.S.A., INC.
Inventor: Yongmian Zhang , Shubham Agarwal
Abstract: In an optical character recognition (OCR) method for digitizing printed text images using a long-short term memory (LSTM) network, text images are pre-processed using a stroke-aware max-min pooling method before being fed into the network, for both network training and OCR prediction. During training, an average stroke thickness is computed from the training dataset. Stroke-aware max-min pooling is applied to each text line image, where minimum pooling is applied if the stroke thickness of the line is greater than the average stroke thickness, while max pooling is applied if the stroke thickness is less than or equal to the average stroke thickness. The pooled images are used for network training. During prediction, stroke-aware max-min pooling is applied to each input text line image, and the pooled image is fed to the trained LSTM network to perform character recognition.
-
-
-
-
-
-
-