Patent search ap:("SRI International") AND inv:"Sarah Bakst" Page 1

1.

发明公开
SPEECH MODIFICATION USING ACCENT EMBEDDINGS 审中-公开

公开(公告)号：US20240304175A1

公开(公告)日：2024-09-12

申请号：US18599018

申请日：2024-03-07

Applicant: SRI International

Inventor： Alexander Erdmann , Sarah Bakst , Harry Bratt , Dimitra Vergyri , Horacio Franco

IPC: G10L13/047 , G10L15/16

CPC classification number: G10L13/047 , G10L15/16

Abstract: Techniques for a machine learning system configured to obtain a dataset of a plurality of sample speech clips; generate a plurality of sequence; initialize a plurality of speaker embeddings and a plurality of accent embeddings; update the plurality of speaker embeddings; update the plurality of accent embeddings; generate a plurality of augmented embeddings based on the plurality of sequence embeddings, the plurality of speaker embeddings, and the plurality of accent embeddings; and generate a plurality of synthetic speech clips based on the plurality of augmented embeddings. The machine learning system may further be configured to obtain an audio waveform; decompose the audio waveform into first magnitude spectral slices and an original phase; process the first magnitude spectral slices to map the first magnitude spectral slices to second magnitude spectral slices; and generate a modified audio waveform in part by combining the second magnitude spectral slices and the original phase.

2.

发明公开
EVALUATING RELIABILITY OF AUDIO DATA FOR USE IN SPEAKER IDENTIFICATION 审中-公开

公开(公告)号：US20230335114A1

公开(公告)日：2023-10-19

申请号：US18301064

申请日：2023-04-14

Applicant: SRI International

Inventor： Sarah Bakst , Aaron Lawson , Christopher L. Cobo-Kroenke , Allen Stauffer

IPC: G10L15/02 , G10L15/06 , G10L15/08 , G10L15/28

CPC classification number: G10L15/02 , G10L15/063 , G10L15/08 , G10L15/28 , G10L2015/081

Abstract: In some examples, a computing system includes a storage device configured to store a machine learning model trained with audio feature values to determine a reliability of an audio segment for performing speech processing; and processing circuitry. The processing circuitry is configured to: receive an audio dataset comprising a sequence of audio segments; extract, for each audio segment of the sequence of audio segments, a set of audio feature values corresponding to a set of audio features; execute the machine learning model to determine, for each audio segment of the sequence of audio segments, a reliability score based on the set of audio feature values corresponding to the respective audio segment, wherein the reliability score indicates a reliability of the audio segment for performing speech processing; and output an indication of the respective reliability scores determined for at least one audio segment of the sequence of audio segments.

Patent Agency Ranking