-
公开(公告)号:US11704351B1
公开(公告)日:2023-07-18
申请号:US17976461
申请日:2022-10-28
Applicant: SAS Institute Inc.
Inventor: Reza Soleimani , Samuel Leeman-Munk , David Blake Styles
IPC: G06F16/34 , G06F40/40 , G06F40/284
CPC classification number: G06F16/34 , G06F40/284 , G06F40/40
Abstract: In one example, a system can receive a set of text samples and generate a set of summaries based on the set of text samples. The system can then generate a training dataset by iteratively executing a training-sample generation process. Each iteration can involve selecting multiple text samples from the set of text samples, combining the multiple text samples together into a training sample, determining a text category and a summary corresponding to a selected one of the multiple text samples, and including the text category and the summary in the training sample. After generating the training dataset, the system can use it to train a model. The trained model can then receive a target textual dataset and a target category as input, identify a portion of the target textual dataset corresponding to the target category, and generate a summarization of the portion of that target textual dataset.
-
2.
公开(公告)号:US11501084B1
公开(公告)日:2022-11-15
申请号:US17747139
申请日:2022-05-18
Applicant: SAS Institute Inc.
Inventor: Reza Soleimani , Samuel Paul Leeman-Munk , James Allen Cox , David Blake Styles
IPC: G06F40/30 , G06F40/284 , G06F40/205 , G06N20/20
Abstract: In one example, a system can execute a first machine-learning model to determine an overall classification for a textual dataset. The system can also determine classification scores indicating the level of influence that each token in the textual dataset had on the overall classification. The system can select a first subset of the tokens based on their classification scores. The system can also execute a second machine-learning model to determine probabilities that the textual dataset falls into various categories. The system can determine category scores indicating the level of influence that each token had on a most-likely category determination. The system can select a second subset of the tokens based on their category scores. The system can then generate a first visualization depicting the first subset of tokens color-coded to indicate their classification scores and a second visualization depicting the second subset of tokens color-coded to indicate their category scores.
-