Invention Grant
- Patent Title: Image processing of webpages
-
Application No.: US16172646Application Date: 2018-10-26
-
Publication No.: US10713545B2Publication Date: 2020-07-14
- Inventor: David B. Hurry , David J. Tabacco
- Applicant: Merck Sharp & Dohme Corp.
- Applicant Address: US NJ Rahway
- Assignee: Merck Sharp & Dohme Corp.
- Current Assignee: Merck Sharp & Dohme Corp.
- Current Assignee Address: US NJ Rahway
- Agency: Fenwick & West LLP
- Main IPC: G06K9/78
- IPC: G06K9/78 ; G06K9/34 ; G06K9/62 ; G06F16/95

Abstract:
A web detection system processes webpage information and performs automated feature extraction of webpages including machine processable information. In an embodiment, the web detection system determines a subset of webpages having a target characteristic by processing markup language. For a webpage of the subset, the web detection system determines that a first image overlaps at least a portion of a second image in the webpage. The web detection system generates an image of the webpage such that the portion of the second image is obscured by the first image. The web detection system determines a graphical feature of the webpage by processing the image, e.g., using optical character recognition. Responsive to determining that the graphical feature corresponds to graphical features of images of a different set of webpages associated with a target entity, the web detection system determines that the webpage is also associated with the target entity.
Public/Granted literature
- US20200134401A1 IMAGE PROCESSING OF WEBPAGES Public/Granted day:2020-04-30
Information query