Rare topic detection using hierarchical clustering

    公开(公告)号:GB2604276A

    公开(公告)日:2022-08-31

    申请号:GB202206094

    申请日:2020-09-29

    Applicant: IBM

    Abstract: A hierarchical topic model may be learned from one or more data sources. One or more dominant words in a selected cluster may be iteratively removed using the hierarchical topic model. The dominant words may relate to one or more primary topics of the cluster. The learned hierarchical topic model may be seeded with one or more words, n-grams, phrases, text snippets, or a combination thereof to evolve the hierarchical topic model, wherein the removed domain words are reinstated upon completion of the seeding.

Patent Agency Ranking