Invention Grant
- Patent Title: Document classification using multiscale text fingerprints
-
Application No.: US14558079Application Date: 2014-12-02
-
Publication No.: US09203852B2Publication Date: 2015-12-01
- Inventor: Adrian Toma , Marius Tibeica
- Applicant: Bitdefender IPR Management Ltd.
- Applicant Address: CY Nicosia
- Assignee: Bitdefender IPR Management Ltd.
- Current Assignee: Bitdefender IPR Management Ltd.
- Current Assignee Address: CY Nicosia
- Agency: Law Office of Andrei D Popovici, PC
- Main IPC: H04L29/06
- IPC: H04L29/06 ; H04L12/58 ; G06Q50/26

Abstract:
Described systems and methods allow a classification of electronic documents such as email messages and HTML documents, according to a document-specific text fingerprint. The text fingerprint is calculated for a text block of each target document, and comprises a sequence of characters determined according to a plurality of text tokens of the respective text block. In some embodiments, the length of the text fingerprint is forced within a pre-determined range of lengths (e.g. between 129 and 256 characters) irrespective of the length of the text block, by zooming in for short text blocks, and zooming out for long ones. Classification may include, for instance, determining whether an electronic document represents unsolicited communication (spam) or online fraud such as phishing.
Public/Granted literature
- US20150089644A1 Document Classification Using Multiscale Text Fingerprints Public/Granted day:2015-03-26
Information query