SPEECH MODIFICATION USING ACCENT EMBEDDINGS
    1.
    发明公开

    公开(公告)号:US20240304175A1

    公开(公告)日:2024-09-12

    申请号:US18599018

    申请日:2024-03-07

    CPC classification number: G10L13/047 G10L15/16

    Abstract: Techniques for a machine learning system configured to obtain a dataset of a plurality of sample speech clips; generate a plurality of sequence; initialize a plurality of speaker embeddings and a plurality of accent embeddings; update the plurality of speaker embeddings; update the plurality of accent embeddings; generate a plurality of augmented embeddings based on the plurality of sequence embeddings, the plurality of speaker embeddings, and the plurality of accent embeddings; and generate a plurality of synthetic speech clips based on the plurality of augmented embeddings. The machine learning system may further be configured to obtain an audio waveform; decompose the audio waveform into first magnitude spectral slices and an original phase; process the first magnitude spectral slices to map the first magnitude spectral slices to second magnitude spectral slices; and generate a modified audio waveform in part by combining the second magnitude spectral slices and the original phase.

    EVALUATING RELIABILITY OF AUDIO DATA FOR USE IN SPEAKER IDENTIFICATION

    公开(公告)号:US20230335114A1

    公开(公告)日:2023-10-19

    申请号:US18301064

    申请日:2023-04-14

    Abstract: In some examples, a computing system includes a storage device configured to store a machine learning model trained with audio feature values to determine a reliability of an audio segment for performing speech processing; and processing circuitry. The processing circuitry is configured to: receive an audio dataset comprising a sequence of audio segments; extract, for each audio segment of the sequence of audio segments, a set of audio feature values corresponding to a set of audio features; execute the machine learning model to determine, for each audio segment of the sequence of audio segments, a reliability score based on the set of audio feature values corresponding to the respective audio segment, wherein the reliability score indicates a reliability of the audio segment for performing speech processing; and output an indication of the respective reliability scores determined for at least one audio segment of the sequence of audio segments.

Patent Agency Ranking