Abstract:
A system and a method for converting microfilm data in a digital format for publishing through a network such as the Internet. First, an image is created of the microfilm, preferably in the TIFF format. Next, the words of the image are recognized through a process of OCR (optical character recognition), with an associated probability of error. The image data can then be converted into a digital format for publication, for example as XML data. Preferably, the user is able to perform a keyword search on the digital format data. More preferably, the keyword search is an adaptive search.
Abstract:
A system and a method for publishing a newspaper page or other data through a Web page, such that the information can be made available more easily through a network such as the Internet. The data is automatically converted to the Web page format by first rendering the newspaper page into a digital format; converting the digital format to a basic internal publishing format; and then publishing the data in any one of a number of different possible publishing formats, including but not limited to, a mark-up langage document such as a Web page for example. The present invention supports such advanced features as arrangement of the content of the newspaper according to relationships within the information of the content and/or according to the preference(s) of the user by analyzing the newspaper page as a plurality of objects. Each newspaper object may optionally be a title, an article, a picture and/or other graphic advertisement, and so forth.
Abstract:
A system and a method for the conversion of archived documents to a digital format (102) and storage of the data extracted in repositories which may be easily extracted and searched by a user over a network such as the Internet (116).
Abstract:
A system and a method for publishing a newspaper page or other data through a Web page, such that the information can be made available more easily through a network such as the Internet. The data is automatically converted to the Web page format by first rendering the newspaper page into a digital format; converting the digital format to a basic internal publishing format; and then publishing the data in any one of a number of different possible publishing formats, including but not limited to, a mark-up langage document such as a Web page for example. The present invention supports such advanced features as arrangement of the content of the newspaper according to relationships within the information of the content and/or according to the preference(s) of the user by analyzing the newspaper page as a plurality of objects. Each newspaper object may optionally be a title, an article, a picture and/or other graphic advertisement, and so forth.
Abstract:
A system and a method for the conversion of archived documents to a digital format and storage of the data extracted in repositories which may be easily extracted and searched by a user over a network such as the Internet. The data is preferably stored in the form of microfilm, although optionally the present invention could be operative with other types of physical media, such as microfiche, paper and any type of printed material. The microfilm data is preferably divided and/or grouped into at least one file. Optionally and preferably, each file undergoes the following automatic processing stages: combining files; analyzing image layout; segmentation; OCR; optional segmentation improvement; and output to XML, or another suitable output data format and/or language. In the last stage, the data contained in the files is preferably extracted and then more preferably transmitted to the relevant repository unit.
Abstract:
A system and a method for the conversion of archived documents to a digital format and storage of the data extracted in repositories which may be easily extracted and searched by a user over a network such as the Internet. The data is preferably stored in the form of microfilm, although optionally the present invention could be operative with other types of physical media, such as microfiche, paper and any type of printed material. The microfilm data is preferably divided and/or grouped into at least one file. Optionally and preferably, each file undergoes the following automatic processing stages: combining files; analyzing image layout; segmentation; OCR; optional segmentation improvement; and output to XML, or another suitable output data format and/or language. In the last stage, the data contained in the files is preferably extracted and then more preferably transmitted to the relevant repository unit.