CRAWLING RICH INTERNET APPLICATIONS

    公开(公告)号:CA2790379A1

    公开(公告)日:2014-03-20

    申请号:CA2790379

    申请日:2012-09-20

    Applicant: IBM CANADA

    Abstract: An illustrative embodiment of a computer-implemented process for crawling rich Internet applications executes sets of events discovered in a state exploration phase according to a predetermined priority of each set of events in the sets of events discovered, wherein events from a higher priority are exhausted before an event from a lower priority is executed and responsive to a determination that transitions remain, executes a set of events in a transition exploration phase. The computer-implemented process further determines whether a new state exists as a result of executing an event in the set of events and responsive to a determination that a new state exists, returning to the state exploration phase.

    IDENTIFYING EQUIVALENT JAVASCRIPT EVENTS

    公开(公告)号:CA2786418A1

    公开(公告)日:2014-02-16

    申请号:CA2786418

    申请日:2012-08-16

    Applicant: IBM CANADA

    Abstract: An illustrative embodiment of a computer-implemented process for identifying equivalent JavaScript events receives source code containing two JavaScript events for equivalency analysis, extracts an HTML element containing an event from each JavaScript event and analyzes the extracted HTML elements. Responsive to a determination that the HTML elements are of a same type according to equivalency criteria B, and responsive to a determination that the HTML elements have a same number of attributes according to equivalency criteria C, determines whether JavaScript function calls of each JavaScript event are similar according to equivalency criteria A. Responsive to a determination that the JavaScript function calls are similar according to equivalency criteria A, and responsive to a determination that the other attributes of the HTML elements satisfy equivalency criteria D, identifies the JavaScript events as equivalent.

    EXCLUSION OF IRRELEVANT DATA FROM A DOM EQUIVALENCE

    公开(公告)号:CA2738290A1

    公开(公告)日:2012-10-28

    申请号:CA2738290

    申请日:2011-04-28

    Applicant: IBM CANADA

    Abstract: An illustrative embodiment of a computer-implemented process for computing excluded data identifies a web page of interest to form an identified page, loads the identified page a first time to form a first load, responsive to a determination that a delta has not been computed for the identified web page, loads the identified page a second time to form a second load and determines whether portions of the first load differ from portions of the second load. Responsive to a determination portions of the first load differ from portions of the second load, the computer-implemented process identifies the portions that differ to form a delta, stores the delta to form stored delta and excludes the stored delta from a document object model associated with the identified page to form a modified document object model.

    PARTITIONING A SEARCH SPACE FOR DISTRIBUTED CRAWLING

    公开(公告)号:CA2790479A1

    公开(公告)日:2014-03-24

    申请号:CA2790479

    申请日:2012-09-24

    Applicant: IBM CANADA

    Abstract: An illustrative embodiment of a computer-implemented process for partitioning a crawling space computes an event identifier for each event in the set of events to form an identified set of events, segments the identified set of events into a number of partitions, assigns a partition to each node in a set of nodes and executes each event in each assigned partition by a respective node. Responsive to a determination that a new state is discovered, other nodes are notified of the new state, wherein information associated with the new state is added to a respective assigned set of event IDs at each node. Responsive to a determination that no more notifications exist, the computer-implemented process determines whether more events to process exist and responsive to a determination that no more events to process exist, the computer-implemented process terminates.

Patent Agency Ranking