Semantic normalization in document digitization

    公开(公告)号:GB2581461A

    公开(公告)日:2020-08-19

    申请号:GB202009248

    申请日:2018-11-30

    Applicant: IBM

    Abstract: A method for normalizing a key in a document image includes identifying a candidate key corresponding to an object in a document image with a key in key ontology data, based on that the candidate key is semantically interchangeable with the key. A context, position, and style of each objects of the document image is represented in the document metadata. The candidate key is normalized into a normal form. A key class corresponding to the normal form is determined and a confidence score indicating a likelihood of the key class being representative of the candidate key is assessed. A semantic database is updated with the key class upon verification for enhanced processing of future documents.

Patent Agency Ranking