SYSTEM FOR DETECTING AND CORRECTING CONTEXTUAL ERRORS IN ATEXT PROCESSING SYSTEM

    公开(公告)号:CA1182570A

    公开(公告)日:1985-02-12

    申请号:CA422316

    申请日:1983-02-24

    Applicant: IBM

    Abstract: SYSTEM FOR DETECTING AND CORRECTING CONTEXTUAL ERRORS IN A TEXT PROCESSING SYSTEM A system for automatically proofreading a document for word use validation in a text processing system is provided by coupling a specialized dictionary of sets of homophones and confusable words to sets of di-gram and N-gram conditions whereby proper usage of the words can be statistically determined. A text document is reviewed word-by-word against a dictionary of homophones and confusable words. When a match occurs, the related list of syntactic rules is examined relative to the context of the subject homophone or confusable word. If the syntax in the immediate context of the homophone or confusable word conflicts with the prestored syntax rules, the homophone or confusable word is highlighted on the system display. The system then displays the definition of the highlighted word along with possible intended alternative forms and their respective definitions. The operator can examine the word used and the possible alternatives and make a determination as to whether an error has been made and if a correction of the text is required. If correction is required, the operator may cause the error word to be replaced by the desired word by positioning the display cursor under the desired word and depressing an appropriate key on the system keyboard.

    CLUSTER STORAGE APPARATUS FOR POST PROCESSING ERROR CORRECTION OF A CHARACTER RECOGNITION MACHINE

    公开(公告)号:CA1062811A

    公开(公告)日:1979-09-18

    申请号:CA230886

    申请日:1975-07-07

    Applicant: IBM

    Abstract: A CLUSTER STORE APPARATUS FOR POST PROCESSING ERROR CORRECTION OF A CHARACTER RECOGNITION MACHINE A cluster storage apparatus is disclosed for outputting groups of valid alpha words as potential candidates for the correct form of an alpha word misrecognized by a character recognition machine. Groups of alpha words are arranged in the cluster storage apparatus such that adjacent locations contain alpha words having similar character recognition misread propensities. Alpha words which have been determined to be misrecognized, are input to the cluster storage apparatus. Numerical values assigned to the characters of which the input word is composed, are used to calculate the address of that group of valid alpha words having similar character recognition misread propensities. The cluster storage apparatus then outputs the accessed groups of alpha words for subsequent processing. The organization of the cluster storage apparatus minimizes the difference in address between alpha words with similar character recognition misread propensities by assigning high numeric values to highly reliable characters, as determined by measuring the character transfer function of the character recognition machine.

    4.
    发明专利
    未知

    公开(公告)号:DE69213532T2

    公开(公告)日:1997-03-20

    申请号:DE69213532

    申请日:1992-03-25

    Applicant: IBM

    Abstract: A system and method are disclosed for enabling the technique of deferred processing of OCR scanned mail to be compatible with existing techniques for mechanical sortation of mail that use standard sort barcode formats which are common to a given destination postal system. This enables deferred OCR processed mail to be sorted on an unsegregated basis along with other types of mail which have not been processed by the deferred OCR technique. This allows the OCR encoded mail to be processed along with other types of encoded mail during standard sort barcode that has been imprinted using prior technology such as OCR or manual code desks.

    NON-TEXT OBJECT STORAGE AND RETRIEVAL

    公开(公告)号:CA2066559A1

    公开(公告)日:1993-01-30

    申请号:CA2066559

    申请日:1992-04-21

    Applicant: IBM

    Abstract: BT9-91-039 NON-TEXT OBJECT STORAGE AND RETRIEVAL A program, method and system are disclosed which senses the presence of a non-text object in a mixed object document to be archived in an information retrieval system. In addition to text objects, a mixed object document can contain non-text objects such as image objects, graphics objects, formatted objects, font objects, voice objects, video objects and animation objects. The invention enables the creation of key words which characterize the non-text object, for incorporation in the inverted file index of the data base, thereby enabling the later retrieval of either the entire document or the independent retrieval of the non-text object through the use of such key words.

    OFFICE CORRESPONDENCE STORAGE AND RETRIEVAL SYSTEM

    公开(公告)号:CA1241122A

    公开(公告)日:1988-08-23

    申请号:CA363345

    申请日:1980-10-27

    Applicant: IBM

    Abstract: OFFICE CORRESPONDENCE STORAGE AND RETRIEVAL SYSTEM A system that intelligently abstracts and archives a document for storage and interprets a free form user retrieval query to recall the document from the storage file. The system includes a method for automatically selecting keywords from the document using a partial speech directory. A method is given for weighing the importance or centrality of each keyword with respect to the document of its origin. Using the same logic paths, a free form query that describes the document in the same manner that it would have to be descried to a secretary to "find" it in a filing cabinet, the system automatically determines the key matching terms and finds the archived document(s) with the greatest affinity.

    ALPHABETIC CHARACTER WORK UPPER/LOWER CASE PRINT CONVENTION RATUS AND METHOD

    公开(公告)号:CA1066418A

    公开(公告)日:1979-11-13

    申请号:CA268693

    申请日:1976-12-23

    Applicant: IBM

    Abstract: ALPHABETIC CHARACTER WORD UPPER/LOWER CASE PRINT CONVENTION APPARATUS AND METHOD The apparatus and method disclosed herein determines whether an alphabetic character field output from an optical character reader(OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. Each word output by the OCR corresponds to a field (i.e., word) of characters on the scanned document. The signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus having precomputed empirical probabilities therein that: a given character recognition is the result of the scan of an upper or lower case character as the case may be. The storage apparatus includes probability values for character conflicts and rejects. As the series of signals from the OCR output are applied to the character occurrence probability storage apparatus, a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination for further text processing.

    BINARY REFERENCE MATRIX FOR A CHARACTER RECOGNITION MACHINE

    公开(公告)号:CA1048155A

    公开(公告)日:1979-02-06

    申请号:CA223701

    申请日:1975-04-02

    Applicant: IBM

    Abstract: BINARY REFERENCE MATRIX FOR A CHARACTER RECOGNITION MACHINE A binary reference matrix apparatus is disclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.

    9.
    发明专利
    未知

    公开(公告)号:DE69016572T3

    公开(公告)日:1998-06-25

    申请号:DE69016572

    申请日:1990-10-10

    Applicant: IBM

    Abstract: The invention is characterized as a data processing architecture and method for multi-stage processing of mail, using knowledge based techniques. The system includes OCR-scanning a multipart address field of a mail piece at a sending location, the address field including at least two portions, a first stage routing portion (destination city, state, country, zip code) and a second stage routing portion (destination street address, building floor, corporate addressee internal routing). At the sending location, the image of the entire address field is captured by an OCR head and stored in memory. A serial number is printed on the mail piece. The first routing portion is then converted into sorting signals to sort the mail piece to a truck at the sending location. While the mail piece is in transit on the truck, the knowledge processor completes its analysis and is able to transmit by electronic communications link to the destination location, the information that the mail piece is on its way and the second stage routing information needed to automatically sort and deliver the mail piece to its corporate addressee.

    10.
    发明专利
    未知

    公开(公告)号:DE69016572D1

    公开(公告)日:1995-03-16

    申请号:DE69016572

    申请日:1990-10-10

    Applicant: IBM

    Abstract: The invention is characterized as a data processing architecture and method for multi-stage processing of mail, using knowledge based techniques. The system includes OCR-scanning a multipart address field of a mail piece at a sending location, the address field including at least two portions, a first stage routing portion (destination city, state, country, zip code) and a second stage routing portion (destination street address, building floor, corporate addressee internal routing). At the sending location, the image of the entire address field is captured by an OCR head and stored in memory. A serial number is printed on the mail piece. The first routing portion is then converted into sorting signals to sort the mail piece to a truck at the sending location. While the mail piece is in transit on the truck, the knowledge processor completes its analysis and is able to transmit by electronic communications link to the destination location, the information that the mail piece is on its way and the second stage routing information needed to automatically sort and deliver the mail piece to its corporate addressee.

Patent Agency Ranking