Statistical stemming
    1.
    发明授权

    公开(公告)号:US08554543B2

    公开(公告)日:2013-10-08

    申请号:US13710055

    申请日:2012-12-10

    Applicant: Google Inc.

    CPC classification number: G06F17/2872 G06F17/2755

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status.

Patent Agency Ranking