Invention Grant
- Patent Title: Identifying unvisited portions of visited information
-
Application No.: US15202224Application Date: 2016-07-05
-
Publication No.: US09916337B2Publication Date: 2018-03-13
- Inventor: Eugenia Kondratova , Paul Ionescu , Obidul Islam , Iosif Viorel Onut
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Cuenot, Forsythe & Kim, LLC
- Priority: CA2779235 20120606
- Main IPC: G06F17/30
- IPC: G06F17/30 ; H04L29/08

Abstract:
Identifying unvisited portions of visited information to visit includes receiving information to crawl, wherein the information is representative of one of web based information and non-web based information, computing a locality sensitive hash (LSH) value for the received information, and identifying a most similar information visited thus far. Identifying unvisited portions of visited information further includes determining whether the LSH of the received information is equivalent to most similar information visited thus far and, responsive to a determination that the LSH of the received information is not equivalent to most similar information visited thus far, identifying a visited portion of the received information using information for most similar information visited thus far and crawling only unvisited portions of the received information.
Public/Granted literature
- US20160314119A1 IDENTIFYING UNVISITED PORTIONS OF VISITED INFORMATION Public/Granted day:2016-10-27
Information query