-
公开(公告)号:FR2376467A1
公开(公告)日:1978-07-28
申请号:FR7735653
申请日:1977-11-18
Applicant: IBM
Inventor: KOLPEK ROBERT A , MACDUFFEE DAVID L , ROSENBAUM WALTER S
IPC: B41J5/30 , B41J7/96 , G06F3/09 , G06F3/12 , G06F11/00 , G06F17/22 , G06F17/27 , G06K9/72 , G06K15/00 , G06F15/40 , B41B27/48 , B41J5/44
Abstract: SYSTEM FOR AUTOMATICALLY PROOFREADING A DOCUMENT Spelling errors in a word processing system are detected and presented to the operator for correction at the end of a document page. A dictionary memory contains representations of the correct spellings for words most frequently used. As each word is typed, it is stored in a word queue where it is compared to the contents of the dictionary memory. If the compare is unequal, then the word and its location on the page is stored in an error memory. When an end of page indicator is set the printer automatically repositions the print head at the ending character of the first word in the error list. When the operator keys in the correct spelling, the printer is caused to remove the misspelled word from the page and type the correct spelling. The corresponding word in the error memory is also corrected. As each misspelled word in the error memory is corrected, the remainder of the memory is scanned and repetitions of the same spelling error are automatically corrected.
-
公开(公告)号:FR2318462A1
公开(公告)日:1977-02-11
申请号:FR7618343
申请日:1976-06-09
Applicant: IBM
Inventor: ROSENBAUM WALTER S
Abstract: A digital reference matrix apparatus is disclosed for verifying input alpha words from a keyboard, character recognition machine, or voice analyzer as valid linguistic expressions. The organization of the digital reference matrix is based upon the character transfer function of the input apparatus. The digital reference matrix contains a vector representation for each dictionary word in the form of a calculated vector magnitude and unique vector angle. The set of magnitudes and angles is stored in the digital reference matrix using a form of run length coding by storing a single magnitude pointer followed by the chain of unique angles for words having the same magnitude. The vector magnitude so calculated constitutes the address data for accessing the digital reference matrix. When an input word is received for verification, the word's magnitude and angle attributes are calculated and the digital reference matrix is accessed at the magnitude of the input word and the corresponding angles are searched for a match. An output signal is generated indicating whether or not the input word is valid. The organization of the digital reference matrix minimizes the size of the array needed for accurate word verification representation through the use of the combination of digital angle representation and run length compaction of the magnitude/angle verification syntax.
-
公开(公告)号:FR2280936A1
公开(公告)日:1976-02-27
申请号:FR7519824
申请日:1975-06-19
Applicant: IBM
Inventor: CHAIRES ANNE-MARIE , CICONTE JEAN-MARIE , ETT ALLEN H , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: A binary reference matrix apparatus is diclosed for verifying input alpha words from a character recognition machine as valid linguistic expressions. The organization of the binary reference matrix is based upon the character transfer function of the character recognition machine. The alphabetic character stream for each word scanned by the character recognition machine, is mapped into a vector representation through the assignment of a unique numeric value for each letter in the alphabet. The vector magnitude and angle so calculated constitute the address data for accessing the binary reference matrix. The point accessed in the matrix will have a binary value of 1 if the scanned word is valid and will have a binary value of 0 if the scanned word is invalid. The organization of the binary reference matrix minimizes the size of the array needed for accurate verification by choosing numerical values for the alphabetic characters in an inverse proportion to the characters read reliability in the character recognition machine, as determined by the empirical measurement of the character recognition machine, character transfer function.
-
公开(公告)号:CA2021664C
公开(公告)日:1998-03-24
申请号:CA2021664
申请日:1990-07-20
Applicant: IBM
Inventor: ROSENBAUM WALTER S , HILLIARD JOHN J
IPC: B07C3/14 , B07C3/00 , B07C3/20 , G06K9/03 , G06K9/72 , G06K13/02 , B07C3/08 , G06F7/08 , G06K7/10
Abstract: The invention is characterized as a data processing architecture and method for multi-stage processing of mail, using knowledge based techniques. The system includes OCR-scanning a multipart address field of a mail piece at a sending location, the address field including at least two portions, a first stage routing portion (destination city, state, country, zip code) and a second stage routing portion (destination street address, building floor, corporate addressee internal routing). At the sending location, the image of the entire address field is captured by an OCR head and stored in memory. A serial number is printed on the mail piece. The first routing portion is then converted into sorting signals to sort the mail piece to a truck at the sending location which is to be dispatched to the city, state and country indicated in the first stage routing portion. Then, while the mail piece is in transit by truck to the destination city, the image of the second stage routing portion is analyzed by a knowledge base processor to resolve street addresses, building floor, corporate addressee internal routing information and addressee name. The deferred execution of the analysis by the knowledge base processor is available because of the sporadic volume of mail pieces submitted to the system. While the mail piece is in transit on the truck, the knowledge processor completes its analysis and is able to transmit by electronic communications link to the destination location, the information that the mail piece is on its way and the second stage routing information needed to automatically sort and deliver the mail piece to its corporate addressee. In addition, the knowledge base processor analyzes the aggregate volume of mail flowing through the postal system and transmits to each destination location, inventory and resource allocation information necessary to plan for the equipment and manpower needed in the following days to sort and deliver the mail at each destination location.
-
公开(公告)号:CA2021664A1
公开(公告)日:1991-04-25
申请号:CA2021664
申请日:1990-07-20
Applicant: IBM
Inventor: ROSENBAUM WALTER S , HILLIARD JOHN J
IPC: B07C3/14 , B07C3/00 , B07C3/20 , G06K9/03 , G06K9/72 , B07C3/08 , G06F7/08 , G06K7/10 , G06K13/02
Abstract: The invention is characterized as a data processing architecture and method for multi-stage processing of mail, using knowledge based techniques. The system includes OCR-scanning a multipart address field of a mail piece at a sending location, the address field including at least two portions, a first stage routing portion (destination city, state, country, zip code) and a second stage routing portion (destination street address, building floor, corporate addressee internal routing). At the sending location, the image of the entire address field is captured by an OCR head and stored in memory. A serial number is printed on the mail piece. The first routing portion is then converted into sorting signals to sort the mail piece to a truck at the sending location. While the mail piece is in transit on the truck, the knowledge processor completes its analysis and is able to transmit by electronic communications link to the destination location, the information that the mail piece is on its way and the second stage routing information needed to automatically sort and deliver the mail piece to its corporate addressee.
-
公开(公告)号:CA1153471A
公开(公告)日:1983-09-06
申请号:CA362811
申请日:1980-10-20
Applicant: IBM
Inventor: GLICKMAN DAVID , ROSENBAUM WALTER S
Abstract: ALPHA CONTENT MATCH PRESCAN METHOD FOR AUTOMATIC SPELLING ERROR CORRECTIONS A system for reducing the computation required to match a misspelled word against various candidates from a dictionary to find one or more words that represent the best match to the misspelled word. The major facility offered is the ability to computationally discern the degree of apparent match that exists between words that do not perfectly match a given target word without requiring the computationally tedious procedure of character by character positional matching which necessitates shifting and realignment to accommodate for differences between the candidate and target words due to character differences or added and dropped syllables. The system includes a method for storing and retrieving words from the dictionary based on their likelihood of being the correct version of a misspelled word and then reviewing those words further using the Prescan Alpha Content Match to reduce the number of candidates that must then be examined in a high resolution positional match to find the candidate(s) which matches the misspelled word with the greatest character affinity. The Prescan Alpha Content Match reduces the number of candidates in contention so as to make a high resolution match computationally feasible on a real-time basis. AT9-79-027
-
公开(公告)号:CA1111564A
公开(公告)日:1981-10-27
申请号:CA300208
申请日:1978-03-31
Applicant: IBM
Inventor: KETTLER HOWARD G , KOLPEK ROBERT A , ROSENBAUM WALTER S
IPC: B41J19/32 , B41J19/58 , B41J19/64 , B41J25/12 , B41J25/22 , G06F3/12 , G06F17/21 , G06K15/00 , G06K15/08
Abstract: VARIABLE CHARACTER SPACING MATRIX FOR PROPORTIONAL SPACING PRINTING SYSTEMS: The aesthetic characteristics of adjacent characters are used to enhance the quality of output in a proportional spacing printer and to provide right margin justification for composing. Spacing between characters is determined not only on the basis of the character being printed, but also on the preceding character already printed on the page. An intercharacter displacement memory is provided which contains a list of ideal spacing for all combinations of characters to be printed. As each character is typed, it and the previously stored preceding character address the intercharacter displacement memory. The output of the intercharacter displacement memory is the ideal value of escapement for this particular combination of characters and font style. The printer automatically repositions the print head prior to printing the next character, rather than positioning the print head after the previous character is printed. Line ending decisions for composing are eliminated during initial and final typing of a document by adding to the intercharacter displacement memory recommendations for altering the ideal spacing between characters, where aesthetically possible, to eliminate the need for line ending hyphenation. During initial keying of each line, escapements for each adjacent pair of characters is totaled in a memory for ideal, shortest (tight), and longest (loose) 1. recommended escapements. The line is automatically termi2. nated within the justification range by a carrier return 3. function based on the escapement totals and the selected 4. right margin. Final playout of the page from memory alters 5. the intercharacter escapements from the ideal values to 6. either longer or shorter escapements depending on whether 7. the line is to be lengthened or shortened.
-
公开(公告)号:CA1092243A
公开(公告)日:1980-12-23
申请号:CA288062
申请日:1977-10-04
Applicant: IBM
Inventor: ROSENBAUM WALTER S , TANNER HOWARD C
Abstract: IMPROVED APPARATUS FOR AUTOMATICALLY FORMING HYPHENATED WORDS Improved hyphenation apparatus is combined with word verification apparatus to automatically provide hyphenation points for input words from a keyboard or other input device. The spelling of each word input to the system is verified by the digital reference matrix section of the apparatus by calculating a vector magnitude and angle for the word which is compared to the contents of a storage dictionary of words. Each cell of storage in the storage dictionary, in addition to containing a unique angle representation of the input word, contains a byte of data representing the valid hyphenation points for the input word. When an input word is verified to be correctly spelled, the hyphenation byte is read out of dictionary and used by the hyphenation section to reassemble the word in hyphenated form. The hyphenated word is then displayed to the operator for appropriate action.
-
公开(公告)号:CA1062810A
公开(公告)日:1979-09-18
申请号:CA221755
申请日:1975-03-10
Applicant: IBM
Inventor: BOLLINGER ELLEN W , CHAIRES ANNE M , CICONTE JEAN M , ETT ALLEN H , HILLIARD JOHN J , ROSENBAUM WALTER S
Abstract: A data processing system is disclosed for selecting the correct form of a garbled input word misread by an optical character reader so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system, having characters which are flagged for segmentation or concatenation OCR misread propensity. The OCR word and a dictionary word are loaded into a pair of associated shift registers, aligning their letters on one end. The dictionary word characters are inspected for error propensity flags. When a splitting propensity, for example, is found for a character, special conditional probability values are accessed from a storage and a calculation is performed of the probability that the first character of the dictionary word was split by the OCR into the first and second characters of the OCR word This regional context probability is compared with the probability of a simple substitution error for the characters. If the probability of segmentation is larger, the OCR characters in the first shift register are shifted one space with respect to the dictionary word characters in the second shift register so that subsequent character pairs to be compared are properly matched. The greater calculated probability is combined in a running product. The dictionary word with the largest running product is output by the system as the most likely correct form of the garbled OCR input word. In addition to optical character recognition, the system disclosed may be applied to correcting segmentation errors in ph?n?m?-characters output from a speech analyzer.
-
公开(公告)号:CA1061000A
公开(公告)日:1979-08-21
申请号:CA264592
申请日:1976-10-25
Applicant: IBM
Inventor: MULLAN PHILIP J , ROSENBAUM WALTER S
Abstract: MULTI-CHANNEL RECOGNITION DISCRIMINATOR A multi-channel multi-genre character recognition discriminator is disclosed which performs the decision making process between strings of characters coming from a multi-channel (i.e., three or more channels) alpha-numeric output optical character reader (OCR) system for use in such applications as, for example, text processing and mail processing. The multi-channel output OCR uses separate recognition processes for each genre or character set indicative of a distinct group with respect to style (i.e., font) or form, and attempts to recognize each character independently as belonging to each respective genre. For example, in a three channel output OCR for reading mixed numeric, English and Russian Cyrillic character sets, the English alphabetic interpretation of a scanned word is outputted as an English alphabetic subfield on a first OCR output line, the Cyrillic interpretation of the scanned word is outputted as a Cyrillic subfield on a second OCR output line, and numeric interpretation of the scanned word is outputted as a numeric subfield on a third OCR output line. A multi-channel multi-genre character recognition discriminator analyzes these three subfield character streams by calculating a first conditional probability that given the OCR has scanned and recognized an English alphabetic character Ei, the probability that numeric NK and Cyrillic CJ characters were respectively misrecognized by their recognition channels; a second conditional probability that given the OCR has scanned and recognized a Cyrillic character CJ the probability that numeric NK and English Ei characters were respectively misrecognized by their recognition channels; and a third conditional probability that given the OCR scanned and recognized a numeric character NK, the probability that English Ei and Cyrillic CJ characters were respectively misrecognized by their recognition channels. These conditional probabilities are developed character by character for each character within a string thereof or a word. A first product of all the first type conditional probabilities is calculated for all of the characters in a word (which may, of course, contain only a single character); similarly second and third products are calculated for the second and third conditional probabilities, respectively. The magnitudes of the products of these conditional probabilities are then compared in an N-channel comparator, and the highest probability subfield is selected as the most probable interpretation of the word scanned by the OCR.
-
-
-
-
-
-
-
-
-