Systems and methods for identifying speech based on cepstral coefficients and support vector machines
Abstract:
Audio content may have a duration. The audio content may be segmented into audio segments. Individual audio segments may correspond to a portion of the duration. Mel frequency spectral power features, Mel frequency cepstral coefficient features, and energy features of the audio segments may be determined. Feature vectors of the audio segments may be determined based on the Mel frequency spectral power features, the Mel frequency cepstral coefficient features, and the energy features. The feature vectors may be processed through a support vector machine. The support vector machine may output predictions on whether the audio segments contain speech. One or more of the audio segments may be identified as containing speech based on filtering the predictions and comparing the filtered predictions to a threshold. Storage of the identification of the one or more of the audio segments as containing speech in one or more storage media may be effectuated.
Information query
Patent Agency Ranking
0/0