Invention Grant
US09122898B2 Systems and methods for processing documents of unknown or unspecified format
有权
用于处理未知或未指定格式的文件的系统和方法
- Patent Title: Systems and methods for processing documents of unknown or unspecified format
- Patent Title (中): 用于处理未知或未指定格式的文件的系统和方法
-
Application No.: US13427506Application Date: 2012-03-22
-
Publication No.: US09122898B2Publication Date: 2015-09-01
- Inventor: Scott Coles , Derek Murphy , Ben Truscott , Ian Davies
- Applicant: Scott Coles , Derek Murphy , Ben Truscott , Ian Davies
- Applicant Address: CH Meyrin
- Assignee: Lexmark International Technology SA
- Current Assignee: Lexmark International Technology SA
- Current Assignee Address: CH Meyrin
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06F17/27

Abstract:
A computer implemented method for extracting meaningful text from a document of unknown or unspecified format. In a particular embodiment, the method includes reading the document, thereby to extract raw encoded text, analysing the raw encoded text, thereby to identify one or more text chunks, and for a given chunk, performing compression identification analysis to determine whether compression is likely. The method can further include performing a decompression process, performing an encoding identification process thereby to identify a likely character encoding protocol, and converting the chunk using the identified likely character encoding protocol, thereby to output the chunk as readable text.
Public/Granted literature
- US20130077855A1 SYSTEMS AND METHODS FOR PROCESSING DOCUMENTS OF UNKNOWN OR UNSPECIFIED FORMAT Public/Granted day:2013-03-28
Information query