SYSTEM AND METHOD OF IDENTIFYING A VARIATION OF A PHRASE IN A TEXTUAL PASSAGE

    公开(公告)号:US20250117586A1

    公开(公告)日:2025-04-10

    申请号:US18378509

    申请日:2023-10-10

    Abstract: A system and method of identifying occurrence of a semantic variation of a phrase in a passage by at least one processor may include calculating a phrase embedding vector, representing a semantic meaning of the phrase; extracting, from a textual representation of the passage, at least one hierarchical set of nested sequences of words; for each sequence, calculating a corresponding sequence embedding vector, representing a semantic meaning of the sequence; for one or more sequence embedding vectors, calculating a corresponding vector similarity value, representing similarity of the sequence embedding vectors to the phrase embedding vector, identifying a sequence corresponding to a maximal vector similarity value of the one or more vector similarity values; and determining the identified sequence as a semantic variation of the phrase, based on the maximal vector similarity value.

    System and method of automatic topic detection in text

    公开(公告)号:US12001797B2

    公开(公告)日:2024-06-04

    申请号:US17318524

    申请日:2021-05-12

    CPC classification number: G06F40/289 G06F16/2468 G06F16/248 G06N3/04

    Abstract: A method and system for automatic topic detection in text may include receiving a text document of a corpus of documents and extracting one or more phrases from the document, based on one or more syntactic patterns. For each phrase, embodiments of the invention may: apply a word embedding neural network on one or more words of the phrase, to obtain one or more respective word embedding vectors; calculate a weighted phrase embedding vector, and compute a phrase saliency score, based on the weighted phrase embedding vector. Embodiments of the invention may subsequently produce one or more topic labels, representing one or more respective topics in the document, based on the computed phrase saliency scores, and may select one or more topic labels according to their relevance to the business domain of the corpus.

Patent Agency Ranking