-
公开(公告)号:US20240304175A1
公开(公告)日:2024-09-12
申请号:US18599018
申请日:2024-03-07
Applicant: SRI International
Inventor: Alexander Erdmann , Sarah Bakst , Harry Bratt , Dimitra Vergyri , Horacio Franco
IPC: G10L13/047 , G10L15/16
CPC classification number: G10L13/047 , G10L15/16
Abstract: Techniques for a machine learning system configured to obtain a dataset of a plurality of sample speech clips; generate a plurality of sequence; initialize a plurality of speaker embeddings and a plurality of accent embeddings; update the plurality of speaker embeddings; update the plurality of accent embeddings; generate a plurality of augmented embeddings based on the plurality of sequence embeddings, the plurality of speaker embeddings, and the plurality of accent embeddings; and generate a plurality of synthetic speech clips based on the plurality of augmented embeddings. The machine learning system may further be configured to obtain an audio waveform; decompose the audio waveform into first magnitude spectral slices and an original phase; process the first magnitude spectral slices to map the first magnitude spectral slices to second magnitude spectral slices; and generate a modified audio waveform in part by combining the second magnitude spectral slices and the original phase.