-
1.
公开(公告)号:US11990134B2
公开(公告)日:2024-05-21
申请号:US18220632
申请日:2023-07-11
Applicant: SAS INSTITUTE INC.
Inventor: Xiaolong Li , Xiaozhuo Cheng , Xu Yang
Abstract: A system, method, and computer-program product includes constructing a transcript adaptation training data corpus that includes a plurality of transcript normalization training data samples, wherein each of the plurality of transcript normalization training data samples includes: a predicted audio transcript that includes at least one numerical expression, an adapted audio transcript that includes an alphabetic representation of the at least one numerical expression, and a transcript normalization identifier that, when applied to a model input comprising a target audio transcript, defines a text-to-text transformation objective causing a numeric-to-alphabetic expression machine learning model to predict an alphabetic-equivalent audio transcript that represents each numerical expression included in the target audio transcript in one or more alphabetic tokens; configuring the numeric-to-alphabetic expression machine learning model based on a training of a machine learning text-to-text transformer model using the transcript adaptation training data corpus; and executing the numeric-to-alphabetic expression machine learning model.
-
公开(公告)号:US11810572B2
公开(公告)日:2023-11-07
申请号:US18207433
申请日:2023-06-08
Applicant: SAS INSTITUTE INC.
Inventor: Xiaozhuo Cheng , Xiaolong Li , Xu Yang
CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L17/00 , G10L25/30 , G10L25/78 , G10L2025/783
Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.
-
公开(公告)号:US11776545B2
公开(公告)日:2023-10-03
申请号:US17994554
申请日:2022-11-28
Applicant: SAS Institute Inc.
Inventor: Xiaolong Li , Xiaozhuo Cheng , Samuel Norris Henderson , Xu Yang
Abstract: An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.
-
公开(公告)号:US20230098063A1
公开(公告)日:2023-03-30
申请号:US17994554
申请日:2022-11-28
Applicant: SAS Institute Inc.
Inventor: XIAOLONG LI , Xiaozhuo Cheng , Samuel Norris Henderson , Xu Yang
Abstract: An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.
-
公开(公告)号:US11538481B2
公开(公告)日:2022-12-27
申请号:US17851264
申请日:2022-06-28
Applicant: SAS Institute Inc.
Inventor: Xiaolong Li , Samuel Norris Henderson , Xiaozhuo Cheng , Xu Yang
Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.
-
公开(公告)号:US11373655B2
公开(公告)日:2022-06-28
申请号:US17498811
申请日:2021-10-12
Applicant: SAS Institute Inc.
Inventor: Xiaolong Li , Xiaozhuo Cheng , Xu Yang
Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.
-
公开(公告)号:US11217233B1
公开(公告)日:2022-01-04
申请号:US17370441
申请日:2021-07-08
Applicant: SAS Institute Inc.
Inventor: Xiaozhuo Cheng , Xu Yang , Xiaolong Li , Biljana Belamaric Wilsey , Haipeng Liu , Jared Peterson
Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.
-
公开(公告)号:US11145309B1
公开(公告)日:2021-10-12
申请号:US17205871
申请日:2021-03-18
Applicant: SAS Institute Inc.
Inventor: Xu Yang
Abstract: An apparatus includes processor(s) to: use an acoustic model to generate a first set of probabilities of speech sounds uttered within speech audio; derive at least a first candidate word most likely spoken in the speech audio using the first set; analyze the first set to derive a degree of uncertainty therefor; compare the degree of uncertainty to a threshold; in response to at least the degree of uncertainty being less than the threshold, select the first candidate word as a next word most likely spoken in the speech audio; in response to at least the degree of uncertainty being greater than the threshold, select, as the next word most likely spoken in the speech audio, a second candidate word indicated as being most likely spoken based on a second set of probabilities generated by a language model; and add the next word most likely spoken to a transcript.
-
公开(公告)号:US10467344B1
公开(公告)日:2019-11-05
申请号:US16380353
申请日:2019-04-10
Applicant: SAS Institute Inc.
Inventor: Teresa S. Jade , Wei-shan Chiang , Aaron Douglas Arthur , Seng Lee , Qin Yang , Xu Yang
IPC: G06F17/27
Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement is met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.
-
公开(公告)号:US12165650B2
公开(公告)日:2024-12-10
申请号:US18634155
申请日:2024-04-12
Applicant: SAS Institute Inc.
Inventor: Xiaolong Li , Xiaozhuo Cheng , Xu Yang
Abstract: A system, method, and computer-program product includes receiving speech audio of a multi-turn conversation, generating, via a speech-to-text process, a transcript of the speech audio, wherein the transcript of the speech audio textually segments speech spoken during the multi-turn conversation into a plurality of utterances, generating a speaker diarization prompt that includes contextual information about a plurality of speakers participating in the multi-turn conversation, inputting, to a large language model, the speaker diarization prompt and the transcript of the speech audio, and obtaining, from the large language model, an output comprising an enhanced transcript of the speech audio, wherein the enhanced transcript of the speech audio textually segments the speech spoken during the multi-turn conversation into a plurality of refined utterances and associates a speaker identification value with each of the plurality of refined utterances.
-
-
-
-
-
-
-
-
-