Invention Grant
- Patent Title: Document data classification using a noise-to-content ratio
-
Application No.: US13614858Application Date: 2012-09-13
-
Publication No.: US09773182B1Publication Date: 2017-09-26
- Inventor: Bernhard Wolkerstorfer , Lei Li , Narendra S. Parihar
- Applicant: Bernhard Wolkerstorfer , Lei Li , Narendra S. Parihar
- Applicant Address: US NV Reno
- Assignee: Amazon Technologies, Inc.
- Current Assignee: Amazon Technologies, Inc.
- Current Assignee Address: US NV Reno
- Agency: Lowenstein Sandler LLP
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06K9/32

Abstract:
A method and system for classifying document data is described. An exemplary method includes identifying a markup language document having a plurality of portions, determining a set of substantive content metrics and a set of noise metrics for each of the plurality of portions, calculating a noise-to-content ratio for each of the plurality of portions based on a corresponding set of substantive content metrics and a corresponding set of noise metrics, and removing noise from the markup language document using the noise-to-content ratio.
Information query