Speech segmentation based on combination of pause detection and speaker diarization

Invention Grant

US11538481B2 Speech segmentation based on combination of pause detection and speaker diarization 有权

Please log in to see more content

Patent Title: Speech segmentation based on combination of pause detection and speaker diarization
Application No.: US17851264

Application Date: 2022-06-28
Publication No.: US11538481B2

Publication Date: 2022-12-27
Inventor: Xiaolong Li , Samuel Norris Henderson , Xiaozhuo Cheng , Xu Yang
Applicant: SAS Institute Inc.
Applicant Address: US NC Cary
Assignee: SAS Institute Inc.
Current Assignee: SAS Institute Inc.
Current Assignee Address: US NC Cary
Agency: KDB Firm PLLC
Main IPC: G10L15/04
IPC: G10L15/04 ; G10L15/16 ; G10L15/26 ; G10L25/78 ; G10L25/30 ; G10L15/02

Speech segmentation based on combination of pause detection and speaker diarization

Abstract:

An apparatus includes at least one processor to, in response to a request to perform speech-to-text conversion: perform a pause detection technique including analyzing speech audio to identify pauses, and analyzing lengths of the pauses to identify likely sentence pauses; perform a speaker diarization technique including dividing the speech audio into fragments, analyzing vocal characteristics of speech sounds of each fragment to identify a speaker of a set of speakers, and identifying instances of a change in speakers between each temporally consecutive pair of fragments to identify likely speaker changes; and perform speech-to-text operations including dividing the speech audio into segments based on at least the likely sentence pauses and likely speaker changes, using at least an acoustic model with each segment to identify likely speech sounds in the speech audio, and generating a transcript of the speech audio based at least on the likely speech sounds.

Public/Granted literature

US20220335947A1 SPEECH SEGMENTATION BASED ON COMBINATION OF PAUSE DETECTION AND SPEAKER DIARIZATION Public/Granted day:2022-10-20

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/04	.分段；字极限检测