Invention Grant
- Patent Title: Method and apparatus for retrieving and indexing hidden pages
- Patent Title (中): 用于检索和索引隐藏网页的方法和装置
-
Application No.: US11570330Application Date: 2005-05-27
-
Publication No.: US07685112B2Publication Date: 2010-03-23
- Inventor: Alexandros Ntoulas , Junghoo Cho , Petros Zerfos
- Applicant: Alexandros Ntoulas , Junghoo Cho , Petros Zerfos
- Applicant Address: US CA Oakland
- Assignee: The Regents of the University of California
- Current Assignee: The Regents of the University of California
- Current Assignee Address: US CA Oakland
- Agency: Vista IP Law Group LLP
- International Application: PCT/US2005/018849 WO 20050527
- International Announcement: WO2006/007229 WO 20060119
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F15/173

Abstract:
A method and system for autonomously downloading and indexing Hidden Web pages from Websites includes the steps of selecting a query term and issuing a query to a site-specific search interface containing Hidden Web pages. A results index is then acquired and the Hidden Web pages are downloaded from the results index. A plurality of potential query terms are then identified from the downloaded Hidden Web pages. The efficiency of each potential query term is then estimated and a next query term is selected from the plurality of potential query terms, wherein the next selected query term has the greatest efficiency. The next selected query term is then issued to the site-specific search interface using the next query term. The process is repeated until all or most of the Hidden Web pages are discovered.
Public/Granted literature
- US20080097958A1 Method and Apparatus for Retrieving and Indexing Hidden Pages Public/Granted day:2008-04-24
Information query