Invention Grant
- Patent Title: Detection of unknown code page indexing tokens
-
Application No.: US15166498Application Date: 2016-05-27
-
Publication No.: US11239858B2Publication Date: 2022-02-01
- Inventor: Michael Baessler , Thomas A. P. Hampp-Bahnmueller , Peng Hui Jiang
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Anthony Curro
- Main IPC: H03M7/34
- IPC: H03M7/34 ; H03M7/38 ; H03M7/00 ; G06F16/81 ; H03M7/30

Abstract:
A method for determining an encoding used for a sequence of bytes may be provided. The method comprises providing a set of candidate code pages and transforming them into different groups of sequences of bytes, wherein each group of sequences of bytes corresponds to one of the candidate code pages. Thereby each code point is transformed by applying a transformation from one of the candidate code pages to a reference code point value relating to a reference encoding for each code point. The method comprises further separating each of the transformed sequences of bytes into groups of tokens, wherein each group of tokens relates to one candidate code page, and providing an index relating to a text corpus. Furthermore, the method comprises selecting a code page from the set of candidate code pages at least partially based on how many tokens are found in the index.
Public/Granted literature
- US20170047943A1 DETECTION OF UNKNOWN CODE PAGE INDEXING TOKENS Public/Granted day:2017-02-16
Information query
IPC分类: