String processing of clickstream data
Abstract:
A method includes assigning unique symbols to pages of a website, respectively. The method includes obtaining page symbol sequences of browsing sessions, respectively. Each browsing session corresponds to a visitor of the website. For each browsing session, the page symbol sequence of the browsing session is a sequence of symbols that corresponds, respectively, to a sequence of pages of the website visited during the browsing session by the corresponding visitor. The method includes generating a master string including the page symbol sequences, generating a suffix array corresponding to the master string, and generating a longest common prefix (LCP) array corresponding to the suffix array. The method includes, based on the suffix array and LCP array, determining one or more most common n-step subsequences of pages (n is an integer greater than 1).
Public/Granted literature
Information query
Patent Agency Ranking
0/0