Patent search ap:("SAS INSTITUTE INC.") AND inv:"Xiaozhuo Cheng" Page 1

1.

发明授权
Dual use of acoustic model in speech-to-text framework 有权

公开(公告)号：US11373655B2

公开(公告)日：2022-06-28

申请号：US17498811

申请日：2021-10-12

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/16 , G10L25/78 , G10L15/26 , G10L15/04 , G06N3/08 , G06N3/04 , G10L25/30 , G10L15/02

Abstract: An apparatus includes processor(s) to: perform preprocessing operations of a segmentation technique including divide speech data set into data chunks representing chunks of speech audio, use an acoustic model with each data chunk to identify pauses in the speech audio, and analyze a length of time of each identified pause to identify a candidate set of likely sentence pauses in the speech audio; and perform speech-to-text operations including divide the speech data set into data segments that each representing segments of the speech audio based on the candidate set of likely sentence pauses, use the acoustic model with each data segment to identify likely speech sounds in the speech audio, analyze the identified likely speech sounds to identify candidate sets of words likely spoken in the speech audio, and generate a transcript of the speech data set based at least on the candidate sets of words likely spoken.

2.

发明授权
Speech-to-analytics framework with support for large n-gram corpora 有权

公开(公告)号：US11217233B1

公开(公告)日：2022-01-04

申请号：US17370441

申请日：2021-07-08

Applicant: SAS Institute Inc.

Inventor： Xiaozhuo Cheng , Xu Yang , Xiaolong Li , Biljana Belamaric Wilsey , Haipeng Liu , Jared Peterson

IPC: G06N3/02 , G06N7/00 , G10L15/04 , G10L15/16 , G10L15/18 , G10L15/197 , G10L15/22 , G10L15/30

Abstract: An apparatus includes processor(s) to: generate a set of candidate n-grams based on probability distributions from an acoustic model for candidate graphemes of a next word most likely spoken following at least one preceding word spoken within speech audio; provide the set of candidate n-grams to multiple devices; provide, to each node device, an indication of which candidate n-grams are to be searched for within the n-gram corpus by each node device to enable searches for multiple candidate n-grams to be performed, independently and at least partially in parallel, across the node devices; receive, from each node device, an indication of a probability of occurrence of at least one candidate n-gram within the speech audio; based on the received probabilities of occurrence, identify the next word most likely spoken within the speech audio; and add the next word most likely spoken to a transcript of the speech audio.

3.

发明授权
Systems, methods, and graphical user interfaces for training a code generation model for low-resource languages 有权

公开(公告)号：US12277409B1

公开(公告)日：2025-04-15

申请号：US18895119

申请日：2024-09-24

Applicant: SAS INSTITUTE INC.

Inventor： Samuel Paul Leeman-Munk , Xiaozhuo Cheng , Xiaolong Li

IPC: G06F11/36 , G06F8/35 , G06F11/3604

Abstract: A system, method, and computer-program product includes identifying a plurality of code synthesis items for a target programming language, generating a code synthesis prompt based on a first sampling of the plurality of code synthesis items, synthesizing, via a large language model, a plurality of raw code segments using the code synthesis prompt, executing the plurality of raw code segments with a code interpreter associated with the target programming language, determining one or more valid code segments of the plurality of raw code segments that the code interpreter successfully executed, aggregating, via a second sampling, the one or more valid code segments into one or more validated code synthesis training samples, and training a code generation model using the one or more validated code synthesis training samples. User interfaces may be provided to allow target coding tasks to be specified via text or speech.

4.

发明授权
Systems and methods for enhanced speaker diarization 有权

公开(公告)号：US12165650B2

公开(公告)日：2024-12-10

申请号：US18634155

申请日：2024-04-12

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/22 , G10L15/02 , G10L15/04 , G10L15/26 , G10L25/30 , G10L25/78

Abstract: A system, method, and computer-program product includes receiving speech audio of a multi-turn conversation, generating, via a speech-to-text process, a transcript of the speech audio, wherein the transcript of the speech audio textually segments speech spoken during the multi-turn conversation into a plurality of utterances, generating a speaker diarization prompt that includes contextual information about a plurality of speakers participating in the multi-turn conversation, inputting, to a large language model, the speaker diarization prompt and the transcript of the speech audio, and obtaining, from the large language model, an output comprising an enhanced transcript of the speech audio, wherein the enhanced transcript of the speech audio textually segments the speech spoken during the multi-turn conversation into a plurality of refined utterances and associates a speaker identification value with each of the plurality of refined utterances.

5.

发明授权
Multithreaded speech data preprocessing 有权

公开(公告)号：US11862171B2

公开(公告)日：2024-01-02

申请号：US17993385

申请日：2022-11-23

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Samuel Norris Henderson , Xu Yang

IPC: G10L15/22 , G10L15/26 , G10L15/04 , G10L25/78 , G10L25/30 , G10L15/02

CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L25/30 , G10L25/78 , G10L2025/783

Abstract: An apparatus includes a processor to: receive, from a requesting device, a request to perform speech-to-text conversion of a speech data set; within a first thread of a thread pool, perform a first pause detection technique to identify a first set of likely sentence pauses; within a second thread of the thread pool, perform a second pause detection technique to identify a second set of likely sentence pauses; perform a speaker diarization technique to identify a set of likely speaker changes; divide the speech data set into data segments representing speech segments based on a combination of at least the first set of likely sentence pauses, the second set of likely sentence pauses, and the set of likely speaker changes; use at least an acoustic model with each data segment to identify likely speech sounds; and generate a transcript based, at least in part, on the identified likely speech sounds.

6.

发明公开
SYSTEMS AND METHODS FOR CONFIGURING AND USING AN AUDIO TRANSCRIPT CORRECTION MACHINE LEARNING MODEL 审中-公开

公开(公告)号：US20230360652A1

公开(公告)日：2023-11-09

申请号：US18214336

申请日：2023-06-26

Applicant: SAS INSTITUTE INC.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/26 , G10L25/30 , G10L15/04 , G10L15/02 , G10L25/78

CPC classification number: G10L15/26 , G10L25/30 , G10L15/04 , G10L15/02 , G10L25/78 , G10L2025/783

Abstract: A system, method, and computer-program product includes constructing a transcript correction training data corpus that includes a plurality of labeled audio transcription training data samples, wherein each of the plurality of labeled audio transcription training data samples includes: an incorrect audio transcription of a target piece of audio data; a correct audio transcription of the target piece of audio data; and a transcript correction identifier that, when applied to a model input that includes a likely incorrect audio transcript, defines a text-to-text transformation objective causing an audio transcript correction machine learning model to predict a corrected audio transcript based on the likely incorrect audio transcript; configuring the audio transcript correction machine learning model based on a training of a machine learning text-to-text transformer model using the transcript correction training data corpus; and executing the audio transcript correction machine learning model within a speech-to-text post-processing sequence of a speech-to-text service.

7.

发明申请
SPEECH SEGMENTATION BASED ON COMBINATION OF PAUSE DETECTION AND SPEAKER DIARIZATION 有权

公开(公告)号：US20220335947A1

公开(公告)日：2022-10-20

申请号：US17851264

申请日：2022-06-28

Applicant: SAS Institute Inc.

Inventor： XIAOLONG LI , Samuel Norris Henderson , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/26 , G10L15/04 , G10L25/78 , G06N3/08 , G06N3/04 , G10L25/30 , G10L15/02

Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.

8.

发明授权
Dual use of audio noise level in speech-to-text framework 有权

公开(公告)号：US11335350B2

公开(公告)日：2022-05-17

申请号：US17498966

申请日：2021-10-12

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/26 , G10L15/16 , G10L15/04 , G10L25/78 , G06N3/08 , G06N3/04 , G10L25/30 , G10L15/02

Abstract: An apparatus includes processor(s) to: perform pre-processing operations including derive an audio noise level of speech audio of a speech data set, derive a first relative weighting for first and second segmentation techniques for identifying likely sentence pauses in the speech audio based on the audio noise level, and select likely sentence pauses for a converged set of likely sentence pauses from likely sentence pauses identified by the first and/or second segmentation techniques based on the first relative weighting; and perform speech-to-text processing operations including divide the speech data set into data segments representing speech segments of the speech audio based on the converged set of likely sentence pauses, and derive a second relative weighting based on the audio noise level for selecting words indicated by an acoustic model or by a language model as being most likely spoken in the speech audio for inclusion in a transcript.

9.

发明授权
Speech audio pre-processing segmentation 有权

公开(公告)号：US11138979B1

公开(公告)日：2021-10-05

申请号：US17138445

申请日：2020-12-30

Applicant: SAS Institute Inc.

Inventor： Xiaozhuo Cheng , Xu Yang , Xiaolong Li

IPC: G10L15/16 , G10L15/02 , G10L15/26 , G10L15/04 , G10L25/78 , G06N3/08 , G06N3/04 , G10L25/30

Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; derive a threshold amplitude based on at least one peak amplitude of the speech audio; designate each data chunk with a peak amplitude below the threshold amplitude a pause data chunk; within a set of temporally consecutive data chunks of the multiple data chunks, identify a longest subset of temporally consecutive pause data chunks; within the set of temporally consecutive data chunks, designate the longest subset of temporally consecutive pause data chunks as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in each speech segment.

10.

发明公开
SYSTEMS AND METHODS FOR ENHANCED SPEAKER DIARIZATION 审中-公开

公开(公告)号：US20240347064A1

公开(公告)日：2024-10-17

申请号：US18634155

申请日：2024-04-12

Applicant: SAS Institute Inc.

Inventor： Xiaolong Li , Xiaozhuo Cheng , Xu Yang

IPC: G10L15/26 , G10L15/02 , G10L15/04 , G10L25/30 , G10L25/78

CPC classification number: G10L15/26 , G10L15/02 , G10L15/04 , G10L25/30 , G10L25/78 , G10L2025/783

Abstract: A system, method, and computer-program product includes receiving speech audio of a multi-turn conversation, generating, via a speech-to-text process, a transcript of the speech audio, wherein the transcript of the speech audio textually segments speech spoken during the multi-turn conversation into a plurality of utterances, generating a speaker diarization prompt that includes contextual information about a plurality of speakers participating in the multi-turn conversation, inputting, to a large language model, the speaker diarization prompt and the transcript of the speech audio, and obtaining, from the large language model, an output comprising an enhanced transcript of the speech audio, wherein the enhanced transcript of the speech audio textually segments the speech spoken during the multi-turn conversation into a plurality of refined utterances and associates a speaker identification value with each of the plurality of refined utterances.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification