Invention Grant
- Patent Title: Optimizing web crawling through web page pruning
-
Application No.: US15244427Application Date: 2016-08-23
-
Publication No.: US09754033B2Publication Date: 2017-09-05
- Inventor: Shahar Sperling , Omer Tripp , Omri Weisman
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Cantor Colburn LLP
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F17/22

Abstract:
Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.
Public/Granted literature
- US20160350423A1 OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING Public/Granted day:2016-12-01
Information query