Invention Grant
US07937338B2 System and method for identifying document structure and associated metainformation
有权
用于识别文档结构和相关元信息的系统和方法
- Patent Title: System and method for identifying document structure and associated metainformation
- Patent Title (中): 用于识别文档结构和相关元信息的系统和方法
-
Application No.: US12112944Application Date: 2008-04-30
-
Publication No.: US07937338B2Publication Date: 2011-05-03
- Inventor: Branimir K. Boguraev , Roy J. Byrd , Keh-Shin F. Cheng , Anni R. Coden , Michael A. Tanenblatt , Wilfried Teiken
- Applicant: Branimir K. Boguraev , Roy J. Byrd , Keh-Shin F. Cheng , Anni R. Coden , Michael A. Tanenblatt , Wilfried Teiken
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Cahn & Samuels, LLP
- Main IPC: G06F15/18
- IPC: G06F15/18 ; G06F17/20

Abstract:
A system and method for processing documents by utilizing the textual content and layout of the documents, including visual indicators, to more efficiently and reliably process the documents across various document types. The system and method identifies visually distinguishable elements within the document, such as section and sub-section boundary indicators, to mark, divide and label the boundaries and content type such that the sections are more clearly identifiable and easily processed. The system and method uses known elements, including section heading types, keywords, section type classifiers, sub-section heading constructs, stop words, and the like to adaptively identify and process a broad range of document types. The system and method continually refines and updates these known elements and allows users to discover and define new elements for further refinement and updating.
Public/Granted literature
Information query