-
公开(公告)号:US20220335947A1
公开(公告)日:2022-10-20
申请号:US17851264
申请日:2022-06-28
Applicant: SAS Institute Inc.
Inventor: XIAOLONG LI , Samuel Norris Henderson , Xiaozhuo Cheng , Xu Yang
Abstract: An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.
-
公开(公告)号:US11335350B2
公开(公告)日:2022-05-17
申请号:US17498966
申请日:2021-10-12
Applicant: SAS Institute Inc.
Inventor: Xiaolong Li , Xiaozhuo Cheng , Xu Yang
Abstract: An apparatus includes processor(s) to: perform pre-processing operations including derive an audio noise level of speech audio of a speech data set, derive a first relative weighting for first and second segmentation techniques for identifying likely sentence pauses in the speech audio based on the audio noise level, and select likely sentence pauses for a converged set of likely sentence pauses from likely sentence pauses identified by the first and/or second segmentation techniques based on the first relative weighting; and perform speech-to-text processing operations including divide the speech data set into data segments representing speech segments of the speech audio based on the converged set of likely sentence pauses, and derive a second relative weighting based on the audio noise level for selecting words indicated by an acoustic model or by a language model as being most likely spoken in the speech audio for inclusion in a transcript.
-
公开(公告)号:US11138979B1
公开(公告)日:2021-10-05
申请号:US17138445
申请日:2020-12-30
Applicant: SAS Institute Inc.
Inventor: Xiaozhuo Cheng , Xu Yang , Xiaolong Li
Abstract: An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; derive a threshold amplitude based on at least one peak amplitude of the speech audio; designate each data chunk with a peak amplitude below the threshold amplitude a pause data chunk; within a set of temporally consecutive data chunks of the multiple data chunks, identify a longest subset of temporally consecutive pause data chunks; within the set of temporally consecutive data chunks, designate the longest subset of temporally consecutive pause data chunks as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in each speech segment.
-
24.
公开(公告)号:US20200050672A1
公开(公告)日:2020-02-13
申请号:US16655615
申请日:2019-10-17
Applicant: SAS Institute Inc.
Inventor: Teresa S. Jade , Wei-shan Chiang , Aaron Douglas Arthur , Seng Lee , Qin Yang , Xu Yang
IPC: G06F17/27
Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement are met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.
-
-
-