-
公开(公告)号:US11984116B2
公开(公告)日:2024-05-14
申请号:US17520816
申请日:2021-11-08
Applicant: GENESYS CLOUD SERVICES, INC.
Inventor: Lev Haikin , Arnon Mazza , Eyal Orbach , Avraham Faizakof
IPC: G10L15/197 , G06N20/00 , G10L15/06 , G10L15/10 , G10L15/22
CPC classification number: G10L15/197 , G06N20/00 , G10L15/063 , G10L15/10 , G10L15/22 , G10L2015/0635
Abstract: A system and method of automatically discovering unigrams in a speech data element may include receiving a language model that includes a plurality of n-grams, where each n-gram includes one or more unigrams; applying an acoustic machine-learning (ML) model on one or more speech data elements to obtain a character distribution function; applying a greedy decoder on the character distribution function, to predict an initial corpus of unigrams; filtering out one or more unigrams of the initial corpus to obtain a corpus of candidate unigrams, where the candidate unigrams are not included in the language model; analyzing the one or more first speech data elements, to extract at least one n-gram that comprises a candidate unigram; and updating the language model to include the extracted at least one n-gram.
-
公开(公告)号:US12001797B2
公开(公告)日:2024-06-04
申请号:US17318524
申请日:2021-05-12
Applicant: GENESYS CLOUD SERVICES, INC.
Inventor: Eyal Orbach , Avraham Faizakof , Arnon Mazza , Lev Haikin
IPC: G06F40/289 , G06F16/2458 , G06F16/248 , G06N3/04
CPC classification number: G06F40/289 , G06F16/2468 , G06F16/248 , G06N3/04
Abstract: A method and system for automatic topic detection in text may include receiving a text document of a corpus of documents and extracting one or more phrases from the document, based on one or more syntactic patterns. For each phrase, embodiments of the invention may: apply a word embedding neural network on one or more words of the phrase, to obtain one or more respective word embedding vectors; calculate a weighted phrase embedding vector, and compute a phrase saliency score, based on the weighted phrase embedding vector. Embodiments of the invention may subsequently produce one or more topic labels, representing one or more respective topics in the document, based on the computed phrase saliency scores, and may select one or more topic labels according to their relevance to the business domain of the corpus.
-