Low complexity detection of voiced speech and pitch estimation

    公开(公告)号:US11176957B2

    公开(公告)日:2021-11-16

    申请号:US16638866

    申请日:2017-08-17

    Abstract: A low-complexity method and apparatus for detection of voiced speech and pitch estimation is disclosed that is capable of dealing with special constraints given by applications where low latency is required, such as in-car communication (ICC) systems. An example embodiment employs very short frames that may capture only a single excitation impulse of voiced speech in an audio signal. A distance between multiple such impulses, corresponding to a pitch period, may be determined by evaluating phase differences between low-resolution spectra of the very short frames. An example embodiment may perform pitch estimation directly in a frequency domain based on the phase differences and reduce computational complexity by obviating transformation to a time domain to perform the pitch estimation. In an event the phase differences are determined to be substantially linear, an example embodiment enhances voice quality of the voiced speech by applying speech enhancement to the audio signal.

    LOW COMPLEXITY DETECTION OF VOICED SPEECH AND PITCH ESTIMATION

    公开(公告)号:US20210134311A1

    公开(公告)日:2021-05-06

    申请号:US16638866

    申请日:2017-08-17

    Abstract: A low-complexity method and apparatus for detection of voiced speech and pitch estimation is disclosed that is capable of dealing with special constraints given by applications where low latency is required, such as in-car communication (ICC) systems. An example embodiment employs very short frames that may capture only a single excitation impulse of voiced speech in an audio signal. A distance between multiple such impulses, corresponding to a pitch period, may be determined by evaluating phase differences between low-resolution spectra of the very short frames. An example embodiment may perform pitch estimation directly in a frequency domain based on the phase differences and reduce computational complexity by obviating transformation to a time domain to perform the pitch estimation. In an event the phase differences are determined to be substantially linear, an example embodiment enhances voice quality of the voiced speech by applying speech enhancement to the audio signal.

    ENHANCED DE-ESSER FOR IN-CAR COMMUNICATIONS SYSTEMS

    公开(公告)号:US20240062770A1

    公开(公告)日:2024-02-22

    申请号:US18386825

    申请日:2023-11-03

    CPC classification number: G10L21/0364 G10L25/18 H03G9/025

    Abstract: Methods and systems for deessing of speech signals are described. A deesser of a speech processing system includes an analyzer configured to receive a full spectral envelope for each time frame of a speech signal presented to the speech processing system, and to analyze the full spectral envelope to identify frequency content for deessing. The deesser also includes a compressor configured to receive results from the analyzer and to spectrally weight the speech signal as a function of results of the analyzer. The analyzer can be configured to calculate a psychoacoustic measure from the full spectral envelope, and may be further configured to detect sibilant sounds of the speech signal using the psychoacoustic measure. The psychoacoustic measure can include, for example, a measure of sharpness, and the analyzer may be further configured to calculate deesser weights based on the measure of sharpness. An example application includes in-car communications.

    Enhanced de-esser for in-car communication systems

    公开(公告)号:US11817115B2

    公开(公告)日:2023-11-14

    申请号:US16099941

    申请日:2016-09-01

    CPC classification number: G10L21/0364 G10L25/18 H03G9/025

    Abstract: Methods and systems for deessing of speech signals are described. A deesser of a speech processing system includes an analyzer configured to receive a full spectral envelope for each time frame of a speech signal presented to the speech processing system, and to analyze the full spectral envelope to identify frequency content for deessing. The deesser also includes a compressor configured to receive results from the analyzer and to spectrally weight the speech signal as a function of results of the analyzer. The analyzer can be configured to calculate a psychoacoustic measure from the full spectral envelope, and may be further configured to detect sibilant sounds of the speech signal using the psychoacoustic measure. The psychoacoustic measure can include, for example, a measure of sharpness, and the analyzer may be further configured to calculate deesser weights based on the measure of sharpness. An example application includes in-car communications.

    Techniques for wake-up word recognition and related systems and methods

    公开(公告)号:US11600269B2

    公开(公告)日:2023-03-07

    申请号:US16308849

    申请日:2016-06-15

    Abstract: A system for detection of at least one designated wake-up word for at least one speech-enabled application. The system comprises at least one microphone; and at least one computer hardware processor configured to perform: receiving an acoustic signal generated by the at least one microphone at least in part as a result of receiving an utterance spoken by a speaker; obtaining information indicative of the speaker's identity; interpreting the acoustic signal at least in part by determining, using the information indicative of the speaker's identity and automated speech recognition, whether the utterance spoken by the speaker includes the at least one designated wake-up word; and interacting with the speaker based, at least in part, on results of the interpreting.

    Babble noise suppression
    10.
    发明授权

    公开(公告)号:US10783899B2

    公开(公告)日:2020-09-22

    申请号:US16073740

    申请日:2016-11-18

    Abstract: Systems and methods are introduced to perform noise suppression of an audio signal. The audio signal includes foreground speech components and background noise. The foreground speech components correspond to speech from a user's speaking into an audio receiving device. The background noise includes babble noise that includes speech from one or more interfering speakers. A soft speech detector determines, dynamically, a speech detection result indicating a likelihood of a presence of the foreground speech components in the audio signal. The speech detection result is employed to control, dynamically, an amount of attenuation of the noise suppression to reduce the babble noise in the audio signal. Further processing achieves a more stationary background and reduction of musical tones in the audio signal.

Patent Agency Ranking