Patent search ap:("SAS INSTITUTE INC.") AND inv:"Xiaozhuo Cheng" Page 2

11.

发明授权
Systems and methods for configuring and using an audio transcript correction machine learning model 有权

公开(公告)号：US11922947B2

公开(公告)日：2024-03-05

申请号：US18214336

申请日：2023-06-26

Applicant: SAS INSTITUTE INC.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/22 , G10L15/02 , G10L15/04 , G10L15/26 , G10L25/30 , G10L25/78

CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L25/30 , G10L25/78 , G10L2025/783

Abstract: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-text post-processing sequence of a speech-to-text service.

12.

发明公开
Method for Configuring and Using a Numeric-to-Alphabetic Expression Machine Learning Model 审中-公开

公开(公告)号：US20230386473A1

公开(公告)日：2023-11-30

申请号：US18220632

申请日：2023-07-11

Applicant: SAS INSTITUTE INC.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/26 , G10L25/30 , G10L15/04 , G10L15/02 , G10L25/78

CPC classification number: G10L15/26 , G10L25/30 , G10L15/04 , G10L15/02 , G10L25/78 , G10L2025/783

Abstract: A system, method, and computer-program product includes constructing a transcript adaptation training data corpus that includes a plurality of transcript normalization training data samples, wherein each of the plurality of transcript normalization training data samples includes: a predicted audio transcript that includes at least one numerical expression, an adapted audio transcript that includes an alphabetic representation of the at least one numerical expression, and a transcript normalization identifier that, when applied to a model input comprising a target audio transcript, defines a text-to-text transformation objective causing a numeric-to-alphabetic expression machine learning model to predict an alphabetic-equivalent audio transcript that represents each numerical expression included in the target audio transcript in one or more alphabetic tokens; configuring the numeric-to-alphabetic expression machine learning model based on a training of a machine learning text-to-text transformer model using the transcript adaptation training data corpus; and executing the numeric-to-alphabetic expression machine learning model.

13.

发明公开
MULTI-THREADED SPEAKER IDENTIFICATION 审中-公开

公开(公告)号：US20230317083A1

公开(公告)日：2023-10-05

申请号：US18207433

申请日：2023-06-08

Applicant: SAS INSTITUTE INC.

Inventor： Xiaozhuo Cheng , Xiaolong Li , Xu Yang

IPC: G10L15/26 , G10L15/04 , G10L25/78 , G10L25/30 , G10L15/02

CPC classification number: G10L15/26 , G10L15/04 , G10L25/78 , G10L25/30 , G10L15/02 , G10L2025/783

Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.

14.

发明授权
Speech audio pre-processing segmentation 有权

公开(公告)号：US11049502B1

公开(公告)日：2021-06-29

申请号：US17138521

申请日：2020-12-30

Applicant: SAS Institute Inc.

Inventor： Xiaozhuo Cheng , Xu Yang , Xiaolong Li

IPC: G10L15/26 , G10L15/16 , G10L15/04 , G10L25/78 , G06N3/08 , G06N3/04 , G10L25/30

Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; configure a neural network to implement an acoustic model that includes a CTC output; provide each data chunk to the neural network and monitor the CTC output for a string of blank symbols; designate each string of blank symbols from the CTC output that is at least as long as a predetermined blank threshold length as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in a selected language in each speech segment.

15.

发明授权
Method for configuring and using a numeric-to-alphabetic expression machine learning model 有权

公开(公告)号：US11990134B2

公开(公告)日：2024-05-21

申请号：US18220632

申请日：2023-07-11

Applicant: SAS INSTITUTE INC.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/22 , G10L15/02 , G10L15/04 , G10L15/26 , G10L25/30 , G10L25/78

CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L25/30 , G10L25/78 , G10L2025/783

Abstract: A system, method, and computer-program product includes constructing a transcript adaptation training data corpus that includes a plurality of transcript normalization training data samples, wherein each of the plurality of transcript normalization training data samples includes: a predicted audio transcript that includes at least one numerical expression, an adapted audio transcript that includes an alphabetic representation of the at least one numerical expression, and a transcript normalization identifier that, when applied to a model input comprising a target audio transcript, defines a text-to-text transformation objective causing a numeric-to-alphabetic expression machine learning model to predict an alphabetic-equivalent audio transcript that represents each numerical expression included in the target audio transcript in one or more alphabetic tokens; configuring the numeric-to-alphabetic expression machine learning model based on a training of a machine learning text-to-text transformer model using the transcript adaptation training data corpus; and executing the numeric-to-alphabetic expression machine learning model.

16.

发明授权
Multi-threaded speaker identification 有权

公开(公告)号：US11810572B2

公开(公告)日：2023-11-07

申请号：US18207433

申请日：2023-06-08

Applicant: SAS INSTITUTE INC.

Inventor： Xiaozhuo Cheng , Xiaolong Li , Xu Yang

IPC: G10L17/00 , G10L15/16 , G10L15/26 , G10L15/04 , G10L25/78 , G10L25/30 , G10L15/02

CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L17/00 , G10L25/30 , G10L25/78 , G10L2025/783

Abstract: A system, method, and computer-program product includes distributing a plurality of audio data files of a speech data corpus to a plurality of computing nodes that each implement a plurality of audio processing threads, executing the plurality of audio processing threads associated with each of the plurality of computing nodes to detect a plurality of tentative speakers participating in each of the plurality of audio data files, generating, via a clustering algorithm, a plurality of clusters of embedding signatures based on a plurality of embedding signatures associated with the plurality of tentative speakers in each of the plurality of audio data files, and detecting a plurality of global speakers associated with the speech data corpus based on the plurality of clusters of embedding signatures.

17.

发明授权
Multithreaded speech-to-text processing 有权

公开(公告)号：US11776545B2

公开(公告)日：2023-10-03

申请号：US17994554

申请日：2022-11-28

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Samuel Norris Henderson , Xu Yang

IPC: G10L15/26 , G10L15/22 , G10L15/02 , G10L15/04 , G10L25/78 , G10L25/30

CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L25/30 , G10L25/78 , G10L2025/783

Abstract: An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.

18.

发明申请
Multithreaded Speech-to-Text Processing 有权

公开(公告)号：US20230098063A1

公开(公告)日：2023-03-30

申请号：US17994554

申请日：2022-11-28

Applicant: SAS Institute Inc.

Inventor： XIAOLONG LI , Xiaozhuo Cheng , Samuel Norris Henderson , Xu Yang

IPC: G10L15/26 , G10L15/02 , G10L25/30 , G10L15/04 , G10L25/78

Abstract: An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.

19.

发明授权
Speech segmentation based on combination of pause detection and speaker diarization 有权

公开(公告)号：US11538481B2

公开(公告)日：2022-12-27

申请号：US17851264

申请日：2022-06-28

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Samuel Norris Henderson , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/04 , G10L15/16 , G10L15/26 , G10L25/78 , G10L25/30 , G10L15/02

Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification