-
公开(公告)号:US20240304175A1
公开(公告)日:2024-09-12
申请号:US18599018
申请日:2024-03-07
Applicant: SRI International
Inventor: Alexander Erdmann , Sarah Bakst , Harry Bratt , Dimitra Vergyri , Horacio Franco
IPC: G10L13/047 , G10L15/16
CPC classification number: G10L13/047 , G10L15/16
Abstract: Techniques for a machine learning system configured to obtain a dataset of a plurality of sample speech clips; generate a plurality of sequence; initialize a plurality of speaker embeddings and a plurality of accent embeddings; update the plurality of speaker embeddings; update the plurality of accent embeddings; generate a plurality of augmented embeddings based on the plurality of sequence embeddings, the plurality of speaker embeddings, and the plurality of accent embeddings; and generate a plurality of synthetic speech clips based on the plurality of augmented embeddings. The machine learning system may further be configured to obtain an audio waveform; decompose the audio waveform into first magnitude spectral slices and an original phase; process the first magnitude spectral slices to map the first magnitude spectral slices to second magnitude spectral slices; and generate a modified audio waveform in part by combining the second magnitude spectral slices and the original phase.
-
公开(公告)号:US20230335114A1
公开(公告)日:2023-10-19
申请号:US18301064
申请日:2023-04-14
Applicant: SRI International
Inventor: Sarah Bakst , Aaron Lawson , Christopher L. Cobo-Kroenke , Allen Stauffer
CPC classification number: G10L15/02 , G10L15/063 , G10L15/08 , G10L15/28 , G10L2015/081
Abstract: In some examples, a computing system includes a storage device configured to store a machine learning model trained with audio feature values to determine a reliability of an audio segment for performing speech processing; and processing circuitry. The processing circuitry is configured to: receive an audio dataset comprising a sequence of audio segments; extract, for each audio segment of the sequence of audio segments, a set of audio feature values corresponding to a set of audio features; execute the machine learning model to determine, for each audio segment of the sequence of audio segments, a reliability score based on the set of audio feature values corresponding to the respective audio segment, wherein the reliability score indicates a reliability of the audio segment for performing speech processing; and output an indication of the respective reliability scores determined for at least one audio segment of the sequence of audio segments.
-