Invention Grant
- Patent Title: System and method for automated domain-extensible web scraping
-
Application No.: US15085948Application Date: 2016-03-30
-
Publication No.: US10423675B2Publication Date: 2019-09-24
- Inventor: Soumendra Daas , Nanjangud C. Narendra , Sekar Udayamurthy
- Applicant: Intuit Inc.
- Applicant Address: US CA Mountain View
- Assignee: Intuit Inc.
- Current Assignee: Intuit Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Hawley Troxell Ennis & Hawley LLP
- Agent Philip McKay
- Priority: IN201631003317 20160129
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/951 ; H04L29/08 ; G06F17/24

Abstract:
An automated extensible scraping script is generated for web scraping that is extensible to a plurality of domains. Web sites are classified based on common extracted domain data, further clustering the data based on common navigation structures, and using such commonalities to automate the generation of scraping code based on predefined and reusable code snippets for specific parts of the web sites. Scraping services include a mapper module and a script generator module. Building blocks include a data model updater, a navigation model generator and a navigation model matcher. An administrative module includes domain clustering and configuration file maintenance.
Public/Granted literature
- US20170220681A1 SYSTEM AND METHOD FOR AUTOMATED DOMAIN-EXTENSIBLE WEB SCRAPING Public/Granted day:2017-08-03
Information query