-
公开(公告)号:GB2582722B
公开(公告)日:2021-03-03
申请号:GB202009558
申请日:2018-11-23
Applicant: IBM
Inventor: KEVIN NORTHRUP , CRAIG TRIM , BADR KHAMIS , KARAN SEHGAL , CHANDRASHEKHAR PADOLE , ABISOLA ADENIRAN
Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: obtaining a document image with objects and identifying microblocks corresponding to each object. Analyzing a position of a microblock for collinearity with another microblock based on respective positional characteristics and adjustable collinearity parameters. Collinear microblocks are identified into a macroblock and computational data of a key-value pair is created from the macroblock. A heuristic confidence level is associated with the key-value pair. Also based on data cluster formation, a table may be classified and data extracted.
-
公开(公告)号:GB2583290A
公开(公告)日:2020-10-21
申请号:GB202009894
申请日:2018-11-23
Applicant: IBM
Inventor: KEVIN NORTHRUP , CRAIG TRIM , THOZAMILE JAVU , TERRY HICKEY
IPC: G06K9/00
Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: obtaining a document image, wherein the document image includes a plurality of objects; identifying a plurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks.
-
公开(公告)号:GB2583290B
公开(公告)日:2022-03-16
申请号:GB202009894
申请日:2018-11-23
Applicant: IBM
Inventor: KEVIN NORTHRUP , CRAIG TRIM , THOZAMILE JAVU , TERRY HICKEY
IPC: G06V30/413 , G06V30/412 , G06V30/414
Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: obtaining a document image, wherein the document image includes a plurality of objects; identifying a plurality of macroblocks within the document image; performing microblock processing within macroblocks of the plurality of macroblocks, wherein the microblock processing includes examining content of microblocks within a macroblock for extraction of key-value pairs, the examining content including performing an ontological analysis of microblocks, wherein the microblock processing includes associating confidence levels to the extracted key-value pairs; and outputting metadata based on the performing microblock processing within macroblocks of the plurality of macroblocks.
-
公开(公告)号:GB2582722A
公开(公告)日:2020-09-30
申请号:GB202009558
申请日:2018-11-23
Applicant: IBM
Inventor: KEVIN NORTHRUP , CRAIG TRIM , BADR KHAMIS , KARAN SEHGAL , CHANDRASHEKHAR PADOLE , ABISOLA ADENIRAN
Abstract: Methods, computer program products, and systems are presented. The methods include, for instance: obtaining a document image with objects and identifying microblocks corresponding to each object. Analyzing a position of a microblock for collinearity with another microblock based on respective positional characteristics and adjustable collinearity parameters. Collinear microblocks are identified into a macroblock and computational data of a key- value pair is created from the macroblock. A heuristic confidence level is associated with the key-value pair. Also based on data cluster formation, a table may be classified and data extracted.
-
公开(公告)号:GB2581461A
公开(公告)日:2020-08-19
申请号:GB202009248
申请日:2018-11-30
Applicant: IBM
Inventor: KEVIN NORTHRUP , CRAIG TRIM , TERRY HICKEY , ABISOLA ADENIRAN , KENJI NORTHRUP
Abstract: A method for normalizing a key in a document image includes identifying a candidate key corresponding to an object in a document image with a key in key ontology data, based on that the candidate key is semantically interchangeable with the key. A context, position, and style of each objects of the document image is represented in the document metadata. The candidate key is normalized into a normal form. A key class corresponding to the normal form is determined and a confidence score indicating a likelihood of the key class being representative of the candidate key is assessed. A semantic database is updated with the key class upon verification for enhanced processing of future documents.
-
-
-
-