-
公开(公告)号:US20240355346A1
公开(公告)日:2024-10-24
申请号:US18576127
申请日:2022-07-14
Applicant: SRI International
Inventor: Jeffrey Lubin , Clay Spence
IPC: G10L21/013 , G10L17/02 , G10L17/04 , G10L17/26
CPC classification number: G10L21/013 , G10L17/02 , G10L17/04 , G10L17/26 , G10L2021/0135
Abstract: A computing system that receives an audio waveform representing speech from an individual and produces as output a modified version of the audio waveform that maintains the speaker's speech characteristics as well as prosody for specific utterances (e.g., voice timbre, intonation, timing, intensity). The system uses a bottleneck-based autoencoder with speech spectrograms as input and output. To produce the output audio waveform, the system includes a reconstruction error-based loss function with two additional loss functions. The second loss function is speaker “real vs fake” discriminator that penalizes for the output not sounding like the speaker. The third loss function is a speech intelligibility scorer that penalizes the output for speech that is difficult for the target population to understand. The produced modified audio waveform is an enhanced speech output that delivers speech m a target accent without sacrificing the personality of the speaker.