METHOD AND APPARATUS FOR ESTABLISHING PIXEL COLOR PROBABILITIES FOR USE IN OCR LOGIC

    公开(公告)号:CA1313266C

    公开(公告)日:1993-01-26

    申请号:CA600083

    申请日:1989-05-18

    Applicant: IBM

    Abstract: METHOD AND APPARATUS FOR ESTABLISHING PIXEL COLOR PROBABILITIES FOR USE IN OCR LOGIC A method of creating a decision tree to enable a character recognition device to recognize characters in a new unknown font includes scanning a document printed in the unknown font to generate an array of pixels, with each pixel having a neighborhood state based on the colors of the surrounding pixels. A plurality of clusters of pixels, each representing a printed character, are identified. A pixel in one cluster is selected, and its neighborhood state determined. The neighborhood state is used to address a memory that has stored in it a probability table providing the probability, in a second font different from the new font being taught, that a pixel is black as a function of its neighborhood state. The stored probability of black associated with the neighborhood state of the select pixel is read from the memory and assigned as the probability of the selected pixel. The decision tree for the new font is generated using that assigned probability of black. SA9 88 016

    CURSOR LOCATION ERROR MINIMIZING SYSTEM

    公开(公告)号:CA1103377A

    公开(公告)日:1981-06-16

    申请号:CA307336

    申请日:1978-07-13

    Applicant: IBM

    Abstract: There is disclosed a method and means for increasing the positional accuracy of operator controlled cursors engaged in the digitized encoding of graphic information such as line drawings. The method steps comprise those of digitizing the instantaneous contact position between the cursor and the data entry surface as reference coordinates detecting any segment of a colored object upon the surface within a predetermined area about the cursor; ascertaining the location within the area of the centroid or the like of the detected segment; and digitizing said ascertained location as a displacement from the reference coordinates. Apparatus for practicing the method comprise an independently actuable cursor formed from a position encoder and an image scanner, the scanner generating a Boolean coded array of points counterpart to a preselected surface area The scanned array is first buffered and then used to actuate an operator viewable LED di-play of the scanned array on one hand and said array is sent to CPU on the other hand. In turn, the CPU calculates the coordinates of the centroid of that array area of contiguous points having the same de-lgnatod 8001ean Icolor) 0 or 1 value Signals representing tho calcul-ted coordlnate- generated by the CPU then cause the LED dlsplay ~o differentlally lndicate ths calculated centrold location to the operator such as by way of a fla-hlng di~play ele~ent

    3.
    发明专利
    未知

    公开(公告)号:DE69529015D1

    公开(公告)日:2003-01-16

    申请号:DE69529015

    申请日:1995-04-04

    Applicant: IBM

    Abstract: An automated optical character recognition method is provided for use in conjunction with a programmable digital processing device. The method inputs a sequence of values representing one or more characters in an array of characters to be optically recognized. The values define one or more dimensional characteristics of the characters. From the input values, a standard dimensional value is determined from a frequency distribution of a selected one of the character dimensional characteristics. For each of the input characters, a set of normalized values is determined from the standard dimensional value. The normalized values correspond to the one or more character dimensional characteristics. Optical character recognition is thereafter performed using the normalized values.

    COMPUTER-IMPLEMENTED METHOD FOR AUTOMATIC EXTRACTION OF DATA FROM PRINTED FORMS

    公开(公告)号:CA2000012C

    公开(公告)日:1995-03-21

    申请号:CA2000012

    申请日:1989-10-02

    Applicant: IBM

    Abstract: A computer-implemented method operable with conventional OCR scanning equipment and software, extracts character data from printed forms. A blank master form is scanned and its digital image stored. Clusters of ON bits of the master form image are first recognized as part of a line and then connected to form lines. All of the lines in the master form image are then identified by row and column start position and column end position, thereby creating a master-form-description. The resulting image, which consists only of lines in the master form, can then be displayed. Regions or masks in the displayed image of master form lines are then created, each mask corresponding to a field where data would be located in a filled-in form. Each data mask is spaced from nearby lines by a predetermined data margin, referred to as D. A filled-in or data form is then scanned and lines are also recognized and identified in a similar manner to create a data-form-description. The data-form-description is compared with the master-form-description by computing the horizontal and vertical offsets and skew of the two forms relative to one another. The created data masks, whose orientation with respect to the master form has been previously determined, are then transposed into the data form image using the computed values of horizontal and vertical offsets and skew. In this manner, the data masks are correctly located on the data form so that the actual data values in the data form reside within the corresponding data masks. Routines are then implemented for detecting extraneous data intruding into the data masks and for growing the masks, i.e. enlarging the masks to capture data which may extend beyond the perimeter of the masks. Thus, the data masks are adaptive in that they are grown if data does not lie entirely within the perimeter of the masks. During the mask growth routine, lines which are part of the background form are detected and removed by line removal algorithms. Following the removal of extraneous data from the masks, the growth of the masks to capture data, and any subsequent line removal, the remaining data from the masks is extracted and transferred to a new file. The new file then contains only data comprising characters of the data values in the desired regions, which can then be operated on by conventional OCR software to identify the specific character values.

    5.
    发明专利
    未知

    公开(公告)号:DE69529015T2

    公开(公告)日:2003-10-09

    申请号:DE69529015

    申请日:1995-04-04

    Applicant: IBM

    Abstract: An automated optical character recognition method is provided for use in conjunction with a programmable digital processing device. The method inputs a sequence of values representing one or more characters in an array of characters to be optically recognized. The values define one or more dimensional characteristics of the characters. From the input values, a standard dimensional value is determined from a frequency distribution of a selected one of the character dimensional characteristics. For each of the input characters, a set of normalized values is determined from the standard dimensional value. The normalized values correspond to the one or more character dimensional characteristics. Optical character recognition is thereafter performed using the normalized values.

    IMAGE RECOGNITION APPARATUS
    6.
    发明专利

    公开(公告)号:CA1317377C

    公开(公告)日:1993-05-04

    申请号:CA595251

    申请日:1989-03-30

    Applicant: IBM

    Abstract: IMAGE RECOGNITION APPARATUS In a document scanning device, a pattern or character recognition algorithm, employed to generate image data, differentiates between high and low probability data and enables modification of the recognition procedure for handing recognition errors.

    7.
    发明专利
    未知

    公开(公告)号:FR2409554B1

    公开(公告)日:1986-04-11

    申请号:FR7830410

    申请日:1978-10-19

    Applicant: IBM

    Abstract: There is disclosed a method and means for increasing the positional accuracy of operator controlled cursors engaged in the digitized encoding of graphic information such as line drawings. The method steps comprise those of digitizing the instantaneous contact position between the cursor and the data entry surface as reference coordinates; detecting any segment of a colored object upon the surface within a predetermined area about the cursor; ascertaining the location within the area of the centroid or the like of the detected segment; and digitizing said ascertained location as a displacement from the reference coordinates. Apparatus for practicing the method comprises an independently actuable cursor formed from a position encoder and an image scanner, the scanner generating a Boolean coded array of points counterpart to a preselected surface area. The scanned array is first buffered and then used to actuate an operator viewable LED display of the scanned array on one hand and said array is sent to a CPU on the other hand. In turn, the CPU calculates the coordinates of the centroid of that array area of contiguous points having the same designated Boolean (color) 0 or 1 value. Signals representing the calculated coordinates generated by the CPU then cause the LED display to differentially indicate the calculated centroid location to the operator such as by way of a flashing display element.

    8.
    发明专利
    未知

    公开(公告)号:FR2409554A1

    公开(公告)日:1979-06-15

    申请号:FR7830410

    申请日:1978-10-19

    Applicant: IBM

    Abstract: There is disclosed a method and means for increasing the positional accuracy of operator controlled cursors engaged in the digitized encoding of graphic information such as line drawings. The method steps comprise those of digitizing the instantaneous contact position between the cursor and the data entry surface as reference coordinates; detecting any segment of a colored object upon the surface within a predetermined area about the cursor; ascertaining the location within the area of the centroid or the like of the detected segment; and digitizing said ascertained location as a displacement from the reference coordinates. Apparatus for practicing the method comprises an independently actuable cursor formed from a position encoder and an image scanner, the scanner generating a Boolean coded array of points counterpart to a preselected surface area. The scanned array is first buffered and then used to actuate an operator viewable LED display of the scanned array on one hand and said array is sent to a CPU on the other hand. In turn, the CPU calculates the coordinates of the centroid of that array area of contiguous points having the same designated Boolean (color) 0 or 1 value. Signals representing the calculated coordinates generated by the CPU then cause the LED display to differentially indicate the calculated centroid location to the operator such as by way of a flashing display element.

Patent Agency Ranking